Exploring University Mathematics 1 N.J.Hardiman Bedford College, London The Commonwealth and Inter- national Library of Science Technology Engineering and Liberal Studies m x ■o 3 c 3 <" —I 3" 3 0) o' (J) 510, I THE COMMONWEALTH AND I NTER NATIONAL LI BR ARY Joint Chairmen of the Honorary Editorial Advisory Board SIR ROBERT ROBINSON, O.M.. F.R.S.. LONDON DEAN ATHELSTAN S PI LH AU S, M I N N E S OTA Publisher: ROBERT MAXWELL, M.C., M.P. MATHEMATICS DIVISION General Editors: W. J. LANGFORD. E. A. MAXWELL EXPLORING UNIVERSITY MATHEMATICS 1 EXPLORING UNIVERSITY MATHEMATICS 1 LECTURES GIVEN AT BEDFORD COLLEGE, LONDON by P. CHADWICK J. R. ELLIS J. H. E. COHN G. T. KNEEBONE H. G. EGGLESTON M. LEVISON S. J. TAYLOR Edited by N. J. HARDIMAN PERGAMON PRESS OXFORD • LONDON • EDINBURGH • NEW YORK TORONTO • SYDNEY • PARIS • BRAUNSCHWEIG E»fe7 \^o5g- 5/o Pergamon Press Ltd., Headington Hill Hall, Oxford 4 & 5 Fitzroy Square, London W.l Pergamon Press (Scotland) Ltd., 2 & 3 Teviot Place, Edinburgh 1 Pergamon Press Inc., 44-01 21st Street, Long Island City, New York 11101 Pergamon of Canada Ltd., 6 Adelaide Street East, Toronto, Ontario Pergamon Press (Aust.) Pty. Ltd., 20-22 Margaret Street, Sydney, New South Wales Pergamon Press S.A.R.L., 24 rue des Ecoles, Paris 5 e Vieweg & Sohn GmbH, Burgplatz 1, Braunschweig Copyright © 1967 Pergamon Press Ltd. tion 1967 1 3 NOV 198fe This book is sold subject to the condition that it shall not, by way of trade, be lent, resold, hired out, or otherwise disposed of without the publisher's consent, in any form of binding or cover other than that in which it is published. (2852/67) • Contents Foreword Editorial 1 SETS AND FUNCTIONS 1 G. T. Kneebone, M. Sc., Ph. D. Reader in Foundations of Mathematics at Bedford College, London. 2 SPECIAL RELATIVITY AND SOME APPLICATIONS 9 J. R. Ellis, Ph.D. Assistant Lecturer in Applied Mathematics at Bedford College. London. 5 SOME PROPERTIES OF INTEGERS AND PRIMES 31 S.J.Taylor, Ph.D. Professor of Pure Mathematics at Westfield College, London. 4 WAVES 44 P. Chadwick. Ph. D. Professor of Mathematics in the University of East Anglia at Norwich. (Formerly Senior Lecturer in Applied Mathematics at Sheffield University.) 5 SQUARE FIBONACCI NUMBERS 69 J.H.E.Cohn, M.A., D.Phil. Lecturer in Mathematics at Bedford College, London. VI CONTENTS 6 DIGITAL COMPUTERS AND THEIR APPLICATIONS M. Levison, Ph.D. Lecturer and Acting Head of Department of Computer Science, Birkbeck College, London. 7 THE ISOPERIMETRIC PROBLEM H. G. Eggleston, M. A., Sc. D. Professor of Pure Mathematics at Bedford College, London. 83 95 Foreword By Professor H. G. Eggleston Head of the Mathematics Department, Bedford College, London Tun lectures in this book were given at the 1965 Bedford College Easter Conference in Mathematics. These confer- ences have been held annually since 1963 when we initiated the programme with what we believe to have been the first conference of its kind held by any university in Britain. Our aims were (a) to increase the contact between schools and universities, (b) to enable pupils from different schools to meet and discuss topics of common interest and (c) to stimulate interest by introducing students in their last or penultimate years at school to branches of mathematics which would be novel to them. The lectures are primarily designed for students about to embark upon a degree course of which mathematics is a major part. As well as students, teachers of mathematics have also taken part in the conferences. Although those attending are drawn from schools in all parts of the country the number involved each year is, unfortu- nately, very limited. This year the organizers of the confer- ences felt that these lectures, given by professional mathe- maticians on subjects of current mathematical interest and yet assuming little mathematical background, would be of interest to a wider public. It was therefore decided to publish them in a book and so increase the "audience" many times. vn Editorial The seven lectures comprising the chapters of this book formed the programme for the 1965 Easter Conference in Mathematics at Bedford College, London. Those attending the conference were mostly in their last year at school and intending to read for a degree in mathematics at university the following year, or teachers concerned with this level of work in the schools. The scope of the lectures is fairly wide and is divided between pure mathematics and applied mathematics, with a natural bias towards the former at this level. Each lecture is quite independent, so that getting "lost" in one lecture does not mean that a subsequent lecture is unintelligible. (This, of course, is less important in the book, as the reader has time to take each lecture as slowly as necessary for complete comprehension.) Wherever possible, a list of suggestions for further reading is given at the end of the chapter. Four of the lectures were given by members of the Mathe- matics Department of Bedford College, whilst for the other three we were very pleased to welcome staff from other colleges or universities who were interested in taking part in the conference. Each lecturer chose a subject in which In- is an expert, either as a teacher or as a research worker. As far as possible in each conference, at least one lecture is given on the lecturer's research work, although the topics which may be presented at this level are rather limited. This year Dr. Cohn talked about a paper which he had published as recently as 1963. Although the problem on Square Fibonacci Numbers is simple to state and can be IX EDITORIAL solved by elementary methods, the solution has eluded able mathematicians for many years and was only discovered by the exercise of considerable ingenuity. Professor Eggleston's lecture on the Isoperimetric Problem was given to an invited audience of the more advanced pupils (i.e. those who had already passed A-level), the school teachers and university lecturers. The mathematical level of this lecture is slightly higher than that of the other six which comprised the main programme for the conference, but taken at the reader's own pace it should be comprehensible and of considerable interest to all readers. I should like to thank all those who have taken part in the writing and proofreading of the book, and the Pergamon Press for the care which they have given to the production of the book. N. J. Hardiman Bedford College, London June, 1965 CHAPTER 1 Sets and Functions G. T. Kneebone As everyone knows who has studied the calculus, one of the most important concepts in mathematics is that of functional dependence. In this lecture I propose to glance at the general notion of function in order to see how it has been reinterpreted as a result of investigations made in recent times into the foundations of mathematics. Up to about 50 years ago it was taken generally for granted that the business of the mathematician is simply to do mathematics, that is to say either to use his expert knowledge in applying mathematical techniques to problems of science or engineering or else to extend mathematical knowledge itself by developing entirely new theories or adding further details to theories already in existence. From time to time a philosopher or philosophically-minded mathematician might crop up, who would ask questions concerning what mathe- matics is about or what constitutes valid mathematical proof — but such persons were thought of as being very much on the fringe of the subject. But today all this is radically changed and foundational studies are accepted, and even welcomed, as belonging to the main body of mathematics. Indeed, some of the most spectacular advances in modern mathematical research are due as much to the efforts of mathematical logicians as to those of mathematicians of the more traditional kind. 2 EXPLORING UNIVERSITY MATHEMATICS 1 Such a state of affairs is not wholly new, for at certain crucial moments in the history of mathematics accepted mathematical ideas have turned out to be less transparent than had hitherto been supposed. It will be sufficient to recall two such episodes. (1) In ancient Greece, whenever mathematicians dealt with magnitudes they envisaged these not as abstract numbers but rather as lengths or areas; and the earlier Greek geometers thought it obvious that any line which can be obtained from a given line by simple geometrical construction can be measured exactly as a multiple of a sufficiently small sub- multiple of that line. But eventually this belief was dispelled by Pythagoras, who demonstrated that the diagonal of a square is not commensurable with the side (i. e. that the square root of 2 is an irrational number). This discovery of Pythagoras revolutionized the whole conception of magni- tude and led to the creation of the famous theory of proportion contained in Book V of Euclid's Elements, and also in a later age, much nearer our own day, to Dedekind's theory of real numbers. (2) The subject of geometry was conceived by Euclid and his contemporaries as a study of the properties of space as it really is — properties that are crudely exemplified in our physical constructions and measurements — and geometry was accordingly presented as a deductive theory based upon a small number of principles that were accepted as indubi- table. Such a view of the nature of geometrical knowledge endured for more than 2000 years; but it had, nevertheless, to be discarded when non-euclidean geometries were found to be a theoretical possibility, and Euclid's set of axioms became only one system among many possible ones. From that time on, euclidean geometry has been interpreted as a hypothetical theory instead of a factual theory of objective reality — and the axiomatic method, as thus modified, today dominates the whole of pure mathematics. SETS AND FUNCTIONS 3 Discoveries of a foundational character can thus have far-reaching repercussions within mathematics itself; and we shall now see how this has happened with the concept of function. To reach adequate understanding of a mathe- matical concept or a mathematical theory, of any but the most abstract kind, one must learn something of its history and something of its logic. On the whole we would expect the logic to count for more than the history — from a strictly mathematical point of view of course, though not necessarily from the rather different point of view of philo- sophy. In the present instance I shall begin with history, since the concept of function is not so much a single abstract idea as an orientation of mind that has found different expression in different ages. For the Greeks, the idea of function did not yet exist. The Greek theory of magnitudes (as presented by Euclid, for example) is essentially a static theory, concerned with discrete quantities and fixed relations between them, whereas the concept of a function arose in the first instance out of the consideration of quantities which are susceptible of variation in accordance with some fixed law. The mathe- matical concept in fact came into being in the seventeenth century, in close association with a radically new frame of thought in natural science. Whereas ancient and mediaeval science had been qualitative the new science was quantitative, and the aim of scientists was now to discover mathematical laws which characterize the interdependence of the physical quantities in terms of which the relevant state of a given system can be described. In some simple cases the relation- ship in question can be expressed by an algebraic equation (as in the case of Boyle's law pv = k) but more usually a differential equation is needed. In either case, however, the physical law is identified with a functional relationship. As long as we are thinking of dependencies between physical quantities, the ideas of variable and function seem very natural. The physical magnitude, e.g. the pressure of 4 EXPLORING UNIVERSITY MATHEMATICS 1 a gas, is something which can be directly measured whenever required, and which then has a well-defined value. And one quantity may be considered as a function of another when the value of the first is uniquely determined by the value of the second. But when we translate these considerations into the language of pure mathematics we no longer have any observable or measurable quantities to fall back upon, and our "variables" are then mere phantoms. Indeed, one can argue that the very notion of a variable quantity, conceived abstractly, is a contradiction in terms. It was in fact argued long ago by Zeno of Elea, in one of his famous paradoxes, that a flying arrow can never be moving, since at any instant it occupies precisely the space in which it is situated. For a long time mathematicians managed to evade the difficulty concerning the nature of abstract variables by relying upon geometrical intuition; and since they were already very much at home with geometrical relationships they were able to develop a detailed theory of functions by such means. Functions were thought to be represented adequately by their graphs; and a function was said to be continuous, for example, if its graph was free from breaks. On this basis a simple proof could be given of the fundamental theorem that if a continuous function / is positive for x = a and negative for x = b then there is a number c between a and b such that f (c) = 0. For we cannot pass from above the ^r-axis to below the #-axis by a continuous path without crossing the axis at least once. Gradually, however, the more careful mathematicians came to distrust this kind of semi-intuitive reasoning; and Cauchy and Gauss, in particular, tried to reformulate analysis (i. e. the theory of functions) strictly in terms of abstract numbers. While almost all the familiar theorems continued to be accepted as valid, this was only because ways were now found of proving them in a rigorous manner. Criticism of the intuitive approach to mathematics was pressed even further by Dedekind and Peano, both of SETS AND FUNCTIONS 5 whom were pioneers in raising the critical study of the foundations of mathematics to the dignity of an independent and essential discipline. One of Dedekind's decisive acts was his uncompromising rejection of the intuitive notion of continuity as an adequate basis for a theory of continuous functions; and it is mainly due to this initiative that the sort of plausibility argument that is exemplified by our earlier "proof" that any continuous function which changes sign must attain the value zero is no longer acceptable as a mathematical demonstration. It was not Dedekind, however, but Peano who eventually dealt the death-blow to intuitive analysis by giving an analytical definition of a space-filling curve. According to the traditional view every function has a graph, and every pair of equations x = f (t), y = g (t), in which / and g are given functions and t is restricted to some given range, defines an arc of a curve. Peano devised a pair of functions f and g with the property that, for any xo such that < Xq < 1 and any y such that < y <, 1, there is a t such that 0<t ^l, x = f (to), and y = g (to). In other words he defined parametrically a "curve" which passes through every point of the square region bounded by the four straight lines x = Q, x = I, y = 0, and y — 1. Since a curve in the intuitive sense is a locus with length but no breadth, it cannot fill up an area; and there is thus a fundamental incompatibility between the intuitive notion of curve and the more formal notion of the locus determined by a parametric representation. The moral of this, as Peano saw it and as the entire mathematical world came by degrees to see it also, is that if mathematical theorems make statements about functions in an abstract sense they cannot be proved by appeal to what is "obviously" the case for curves in a totally different intuitive sense. Since 1890, the year of publication of the paper in which Peano defined his anomalous curve, mathematicians have systematically purged the theory of functions of every vestige 6 EXPLORING UNIVERSITY MATHEMATICS 1 of reliance on geometrical ideas; and the theory in the form in which it is normally presented today is wholly abstract. But since, even in the most formal mathematics, we cannot produce something out of nothing, we must begin with something that is admitted as logically prior to mathematics, even though it can no longer be supplied by spatial intuition. There is now fairly general agreement that what we must presuppose consists of (1) the basic principles of logic, which are implicit in the very conception of deductive reasoning, and (2) certain very general notions such as that of set, which are themselves closely allied to abstract logic. Using only such materials, which lie at the furthest extreme of generality and abstractness, we can in fact refashion not only the theory of functions but the whole of pure mathematics. A function is essentially a correspondence, whereby a value jjq of the dependent variable y is associated with any arbitrarily chosen value xq of the independent variable x, and it is thus uniquely determined by the totality of all such associated pairs of numbers {xo, yo). If we denote by R the totality or set of all (real) numbers, and by R X R the set of all ordered pairs (x, y) of two real numbers x and y, every function / gives rise to a certain subset F of R X R, namely the set of all ordered pairs (x, y) such that y = / (x). And conversely, every subset F of R X R with the property that, for every number x, there is precisely one number y such that the ordered pair (x, y) belongs to F, determines a function /. The concept of function is thus definable in set-theoretic terms; and the same is true in principle of every other mathematical concept — though it may well be that the set-theoretic definition of what appears to be a simple intuitive concept is formidably elaborate* In discussing the concept of function we have taken the concept of number for granted, but naturally this concept also must be made independent of intuition by abstract set-theoretic definition. How this is done is, in fact, another story, and by no means a short one. I have already referred SETS AND FUNCTIONS 7 to Dedekind's theory of real numbers, which provides the modern answer to Pythagoras's problem of the irrationality of such numbers as j/2; and this, or some equivalent theory, is needed as part of our analysis of the number system. Dedekind defined real numbers in terms of rational numbers (i. e. ratios or fractions plq) and there is no difficulty in defining the rational numbers in their turn in terms of the 'natural numbers" 0, 1, 2, — If we can once introduce the natural numbers in a satisfactory way, then all is relatively plain sailing. Detailed examination of the use that is made of the natural numbers in mathematics reveals that all that needs to be known concerning these numbers is that they constitute a progression. In other words, we can generate the entire system of natural numbers progressively by starting with and repeatedly passing from n, the last number so far obtained, to its successor n', i. e. the number n + 1. One way of introducing the natural numbers into mathematics — a way favoured by both Dedekind and Peano — is to postulate the initial number and the successor operation as given, and then to adopt axioms which confer on the set of natural numbers the structure of a progression. This is precisely the function of the well-known Peano axioms for the natural numbers: (1) is a natural number. (2) If n is a natural number, the successor n of R is also a natural number. (3) No two natural numbers have the same successor. (4) is not the successor of any natural number. (5) If P is a property such that (a) has the property P, and (b) if n has the property P then so also has n, then every natural number has the property P. (The axiom of mathematical induction.) This axiomatic way of introducing the natural numbers s EXPLORING UNIVERSITY MATHEMATICS 1 is completely satisfactory for mathematical purposes. But if we do not wish to postulate special concepts such as those of zero and the successor operation in addition to the general concepts of logic and set theory, we can adopt the alternative course of defining a particular progression in set- theoretic terms and then using this always to stand for the progression of natural numbers — which it can perfectly well do since the natural numbers have no mathematically signi- ficant features beyond that of being a standard progression. In fact we use the standard progression very much as the progression 0, 1, 2, . . ., 10, 11, . . , of numbers in the scale of ten is used in more intuitive mathematics?. The customary definition of the natural numbers as sets runs as follows. We denote by the empty set (i. e. the zero totality which has nothing at all belonging to it); and if a.b,...,k are given entities we denote the set of these entities bv {a, b, . . ., fc}. Then we may form the progression of sets 0, {0}, {0.{0», {0,{0} J {0>{0>}}'--" by taking first the empty set 0, and then at each stage introducing a further set whose members are all the sets previously introduced. The sets of this progression are then defined to be the natural numbers, and from this point on they are represented by the familiar symbols 0, 1, 2, 3, — Such, in brief, is the transformation that mathematics has undergone as the result of the criticism to which its founda- tions have been subjected in modern times. And such is the resulting conception of mathematics that is now presupposed as a matter of course in current research in this field. References for further reading KNEEBONE, G.T., Mathematical Logic and the Foundations of Mathematics, London 1963 (especially Chapter 5). ROTMAN, B. and KNEEBONE, G. T., The Theory of Sets and Trans- finite Numbers, London, 1966. CHAPTER 2 Special Relativity and some Applications J.R.Ellis The special theory of relativity was proposed by Einstein in 1905. During the early days of relativity, his work went under a rather different heading from the title we give it today. The original paper (Ann. Phys. Lpz. 17) by Einstein was called "Electrodynamics of moving bodies". The subject has since been called the special theory of relativity, or the restricted theory, to differentiate it from his second theory, the general theory, which he wrote in 1915. The latter theory was a theory of gravitation, based on concepts he introduced in his special theory. While the general theory could never be regarded as a mere application of the special theory, the special theory has, nevertheless, many direct applications, so much so in fact, that this theory is recognized without question as a corner-stone of modern physics. Mathematically, the special theory of relativity is really a geometry: a certain kind of geometry of four dimensions which connects three spatial coordinates x, y, z and a time coordinate t. It is not surprising that it is still important in the development of present theories. Relativity is construed as an essential part of physics, and all theories must fit in with this four-dimensional framework and obey the so called relativity principle. 9 10 EXPLORING UNIVERSITY MATHEMATICS 1 In order to arrive at this four-diinensional geometry which has just been mentioned, and to explain the relativity prin- ciple, we must first go back to Newton, or at least his conception of space and time. His own framework, on which he founded his three laws of motion, supposes the existence of an absolute euclidean three-dimensional space and an absolute universal time, which is common to all observers independent of their position or motion in space. Space and time are separate entities. They can be envisaged indepen- dently of one another. Suppose a particle moves with uniform velocity in one dimension (which we may take to be the x-axis). In a space-time diagram, the motion is recorded as a straight line (Fig. 2.1). We can also record a two-dimensional motion in a space- time diagram without difficulty. For instance, if a particle moves in a circle with constant speed, the space time diagram turns out to be a helix (a curve in the shape of a spiral advancing along an axis) (Fig. 2.2). However, when we try to represent three-dimensional motion in a space-time diagram we get into difficulty. We should need to go into four dimensions to represent it. But SPECIAL RELATIVITY AND SOME APPLICATIONS t 11 •-y Fie. 2.2. this is still mathematically feasible; and in fact we often do work with a space of four dimensions in this way, although it is not possible to represent the situation dia- gramatically. The kind of geometry to use between x, y, z, t now becomes an important question. In classical mechanics, space-time diagrams are merely representative diagrams, although we can imagine them based on euclidean geometry by a suitable definition of the word "graph", without any loss. However, for the purpose of bringing the work closer to Einstein, we must be prepared to read a little more into the geometrical relationship con- necting x and t and to acknowledge that in the end it may turn out not to be euclidean. But let us first of all see how, on the basis of euclidean geometry between x and t in Fig. 2.1, the newtonian picture represents, for instance, the motion of light, in one dimension. Light travels with a constant velocity of approximately 3 X 10 10 cm/sec, in a straight line. The magnitude of this velocity is commonly denoted by c. It would be useless to try to draw a line representing the motion of a light pulse on the space-time diagram of Fig. 2.1, since the slope of the line would be t 1 x ~ 3 X 10 10 ' 12 EXPLORING UNIVERSITY MATHEMATICS 1 and this would be so small that the line would be indistin- guishable from the axis of x. This, of course, would not conflict with Newton's opinion that light travelled instanta- neously, but rather than accepting this, since we now know otherwise, let us agree to multiply the units in which the time t is measured, by c. Then the motion of a light pulse moving along the #-axis would be represented by a straight line with unit slope [Fig. 2.3 (a)] . We are now measuring time T (= ct) in light-seconds (the distance in centimetres tra- velled by light in one second). A light-year (the distance travelled by light in one year) is approximately equal to 3- 16 X 10 7 light-seconds. I (=cl) \ Fig. 2.3. The dotted line in Fig. 2.3 (a) indicates the motion of a light pulse emitted in the opposite direction from O. It is not difficult to see that a ray of light emitted in both directions from O for a finite time interval between T = and T = a (say) would be represented as a region of the space-time diagram [Fig. 2.3 (b)]. If a ray of light shines in both directions for an infinite time from T = to T = °o (a = °o) the corresponding region is easily obtained [Fig. 2.4 (a)]. This part of the space-time SPECIAL RELATIVITY AND SOME APPLICATIONS 13 diagram is called the future region from O, because all points, or rather events (x, T), within it can be arrived at from O without having to travel at a speed greater than that of light (the slope of all lines from O to points in the region is numerically greater than unity). It is, of course, impos- sible, from a physical point of view, to have speeds greater than that of light. The future region thus constitutes the totality of all events which are accessible from O. FIG. 2.4. In a similar way, the region bounded by light pulses tra- velling towards the origin is called the past region from O [Fig. 2.4 (b)], and represents the totality of all events which may communicate with O. The remaining region is called the present region with O and is made up of all those events which are neither acces- sible from O nor communicable with O. We then have the complete picture, built up from light pulses travelling towards and away from the origin [Fig. 2.5 (a)]. The terminology "past", "present", "future" is used in anti- cipation of special relativity, where it is quite common. At the moment, the terms past, present, future are relative only. They are entirely dependent on the fact that O is at rest with respect to absolute space. If we consider a three-dimensional space-time diagram, representing a two-dimensional motion in the plane of x and 14 EXPLORING UNIVERSITY MATHEMATICS 1 T Fig. 2.5. y, the past, present and future regions of this diagram are separated by an infinite cone, called the light-cone, whose semi-angle has unit slope [Fig. 2.5 (b)]. Any cross-section of this diagram through the time-axis brings us back to a two- dimensional diagram [Fig. 2.5 (a)]. For four-dimensional space-time, the same terminology is used, although the light-cone in question is no longer a two- dimensional surface but a three-dimensional one. Fig. 2.6. SPECIAL RELATIVITY AND SOME APPLICATIONS 15 Before we introduce the idea of relativity, we have a fairly good application of the things we have mentioned so far, which we might consider for a minute: the past history of the universe. Let us choose coordinates by supposing that our Milky Way, the nebula which we, in the solar system, inhabit, is at the event x = 0, y = 0, z = 0, T = 0. We take a cross-section in the ^-direction and represent the situation as shown in Fig. 2.6. Consider the light L which travels to us from the distant parts of the universe, in fact, from the distant nebulae, labelled by the iV's. The distant nebulae are understood to be receding from us with speeds which vary in proportion to their distance from us.f If we assume that the nebulae have always been moving with uniform speeds equal to their present speeds, extrapolation of their path lines in the space-time diagram will reveal that they appear to emanate from one common event in the past. This event is at (0, — cE) where H = 4-1 X 10 17 sec (ot 1-3 X 10 10 years). (It will be recalled that we are measuring the time T in light-seconds.) It is as though all the nebulae were packed tightly together 1-3 X 10 10 years ago and then moved apart as though an explosion had occurred there. It is custo- f A galaxy's speed (the word "galaxy" is synonymous with "ne- bula") can be estimated by analysing the spectrum of the light received from it. The so called "red-shift" — the apparent shift of the entire spectrum in relation to known patterns of absorption lines (which indicate the presence of certain elements within the vicinity of the galaxy) — is interpreted as a Doppler effect, the diminution of frequency which is observed when a source is mov- ing away from us. The effect is well known in the study of sound vibrations. The sudden drop in the pitch of a whistle of a train as it rushes past us is a common example of this, involv- ing a raised frequency (advance) and a lowered frequency (re- cession). Similar principles apply equally well in the case of light waves, e. g. an ordinarily green light when travelling with a speed of about c/5 would appear blue if travelling towards us, and red if travelling away from us. 16 EXPLORING UNIVERSITY MATHEMATICS 1 SPECIAL RELATIVITY AND SOME APPLICATIONS 17 mary to say that H, on the basis of this very naive picture, is the age of the universe, f It is interesting to compare this figure, which is arrived at from a purely optical approach, with the age of the earth determined from the analysis of radioactive rocks. This age is about 2-6 X 10 9 years, possibly higher. (At least on the basis of the space-time model the age of the earth is not greater than the age of the universe!) We return to the earth to describe a certain relativity of motion which was known to Newton. So far the measure- ment of relative motion has not been mentioned. This is precisely where relativity comes in. We shall see that new- tonian relativity, as it is sometimes called, fails in its attempt to describe relative motion and moving sources of light. The reasons for its failure lie in the assumption of an absolute space and an absolute time. Newtonian relativity is based on the equations x = x - Vt, y = y, z' = z, which connect two frames of reference: (x, y, z) which is fixed in absolute space, and (x\ y', z') which is moving in the ^-direction with constant velocity V (Fig. 2.7). Newton simply observed that his famous law of dynamics force = mass X acceleration docs not change its character when we pass from the first coordinate system to the second. This is because the law is independent of velocity. The essence of this statement is that if we perform a dynamical experiment when we are travelling with a constant velocity V with respect to the t Cosniological theories which envisage a definite point of origin usually incorporate II as the age of the universe. There are other cosmological theories which envisage no definite origin. A dis- cussion of each type of theory is unfortunately beyond the scope of this lecture. earth, the result we get would be the same as that produced by an identical experiment at rest on the ground (assuming for the moment that the earth is a good enough representation of absolute space: we shall come to this shortly). Velocity, dynamically speaking, is not an absolute quantity, but purely relative, whereas acceleration is absolute Since Newton's second law is referred to as the "law of inertia", a coordinate system in which the law is valid is called an inertial frame. By the principle of newtonian relativity, an absolute coordinate system can be substituted by any one of a whole set of inertial frames; the significant point is that no one of these frames is more important than any other. On the basis of newtonian relativity (which it will be remembered will fail in its attempt to describe kinematics satisfactorily), let us see how the moving frame of Fig. 2.7 2 Z' Fig. 2.7. records a light pulse emitted in the usual way from O in the fixed frame. Consider the situation in the common x-x' direction for simplicity, and set up two space-time diagrams, one for the fixed frame and one for the moving frame. Fig. 2.8 shows how each frame "sees" the light pulse. It is clear that a person moving with the origin O' will see the light pulse travelling with a velocity c — V in the forward direction and c + V in the reverse direction owing to the velocity V of O' with respect to O. 0"s conception of past 18 EXPLORING UNIVERSITY MATHEMATICS 1 SPECIAL RELATIVITY AND SOME APPLICATIONS 19 and future will be different from that of O; and, in general, any optical experiment whicli is carried out in the moving frame will give a different result to that of a similar experiment carried out in the fixed frame. Hence the principle of newtonian relativity holds for dynamics, but not, it seems, for optics. Because of this, and because the motion of light is always uninfluenced by the motion of its source.t T (=ct) T (=ct) x = cl x'= - <c+V)1 X'= (C-V)t Fig. 2.8. it should be possible to measure the earth's velocity with respect to absolute space by doing a suitable optical experiment on the earth. Such an experiment was carried out by Michelson and Morley in 1887. They set themselves the task of finding the velocity of the earth with respect to the stationary luminiferous aether which in their time was supposed to characterize and fill the absolute space (or one of the dynamically equivalent inertial frames) devised by Newton, which had since become so crucial from the point of view of optics. Their experiment, it could be said, was designed to detect an "aether wind". t This fact emerges in the observation of "double stars", i. e. two stars moving in orbits around their common centre of mass. If light took longer to reach us from one star than from the other (when the first was receding and the second approaching) this would give rise to anomalies in the observed motions of these stars (viz. under simple velocity addition, circular orbits would appear eccentric), and no such anomalies have been found. This famous experiment of 1887 was a refinement of a previous experiment which Michelson had carried out in 1880 (at that time the sensitivity of the apparatus he used made the result unreliable). Michelson's idea was very simple. It was to send two beams of light from the same source, fixed to the ground, in two mutually perpendicular directions and over equal distances, and then to reflect them back again into the same straight line and measure any difference in the arrival times of the light waves by observing interference fringesf (Fig. 2.9). A = glass plate, half-silvered on its rear surface. M,= mirror, reflecting first beam. B = compensating glass plate having the same thickness as A. This is employed so that the two beams have equal path lengths in glass. M, = mirror, reflecting second beam, LIGHT SOURCE ",-^j„ o Fig. 2.9. The experiment can be compared with a simple problem in relative velocity. It is not difficult to show that it always t When two light waves arrive in step with thedr crests together they reinforce one another and produce a bright fringe (construc- tive interference); when they arrive out of step cancellation takes place and a dark fringe is produced (destructive interference). In practice any difference in the arrival times of the two beams in Michelson's apparatus could be most easily discovered by rotating the whole apparatus through 90° when a fringe shift should be produced, since the roles of the mirrors in the experiment would be changed. 20 EXPLORING UNIVERSITY MATHEMATICS 1 takes longer to fly an aeroplane at a certain rate up-wind and back, over a certain distance, than it does to fly at the same rate perpendicularly across-wind and back, over an equal distance. (The proof is given in an appendix, but the reader should attempt to prove it for himself.) Correspon- dingly, for Michelson's experiment, if an aether wind prevailed, one light beam must travel up-wind and back while the second must travel across-wind and back, and the first beam will arrive later than the second. The result of the Michelson-Morley experiment was quite sensational. The two arrival times were the same. No aether wind existed. The experiment was repeated when the earth was in a different position in its orbit round the sun and the same result was obtained. Again no aether wind. The problems posed by the unexpected result of this experiment led indirectly to Einstein's theory of relativity. Einstein established the idea that the geometry of space-time diagrams (so to speak) was non-euclidean. In this way he was able to revive the principle of newtonian relativity for dynamics and extend it to optics (and electromagnetism) as well, and in doing this, the old idea of absolute space and a luminiferous aether came to be abolished. In short it was supposed that the space-time diagrams of Fig. 2.8 were basically incorrect and ought to be replaced by identical pictures on the basis of the null result of the Michelson-Morley experiment (Fig. 2.10). T (=cl) T'(=cl') Fig. 2.10. SPECIAL RELATIVITY AND SOME APPLICATIONS 21 It was precisely the old transformation rules x' = x — Vt, y' = y, z — z which embodied the idea of euclidean three-dimensional space and absolute time. Instead, new transformations were given, in which the time T (= ct') of the moving system was no longer the same as T of the fixed system, thus dropping the idea that time was something absolute. The light pulse x = T had to be the same as the light pulse x = T', and x = — T the same as x' = — V . This was certainly true. (We follow an argument which Einstein, or his contemporaries, might have considered.) In addition there was no harm in tentatively assuming that the new transformations had an essentially simple character, and it was therefore supposed, on the grounds of simplicity, that x' -T' = h (x - T), (1) x + T = I (x + T), (2) where k and I were constants to be determined. It was reasonable to choose kl = 1 from the point of view of units in the two systems. This assumption led to the equation x .'2 _ T'Z = X 2 - J** [multiplying (1) and (2)] and this equation also contained the geometrical aspect of the transformation: the quantity x 2 — c 2 1- should not change its value on transformation from one coordinate system to another. Physically this meant that the light pulse must present identical pictures to the two frames. (If it had turned out that x 2 + c 2 1 2 were the invariant, then the geometry between x and T would have, indeed, been strictly euclidean, by Pythagoras.) 22 EXPLORING UNIVERSITY MATHEMATICS 1 SPECIAL RELATIVITY AND SOME APPLICATIONS 23 Adding (1) and (2), and subtracting (1) and (2) gives two further equations: h+ l) X -[ k -l) T > 2T ' = [ k + l k ~i u or fc 2 + l x = 2k fc 2 -l x ~k^l T fc 2 + 1 r k 2 - i r = — — lr- 2k k*+i The spatial origin x = must be equivalent to x = Vt (or x = VT/c), since it is travelling with speed V in the fixed frame (the diagram of Fig. 2.7 still applies). The last equation therefore gives fc 2 -l V c an d h k* + 1 1 ence = fi (say). A: 2 + 1 c ' 2 k ]/(i- Wc 2 ) Thus the rules of transformation are x' = (S (x - VT/c), T' = f}(T- Vx/c), (y' = y, z m z), or x' = p(x-Vt), y' = y, z' = z, f = fi (t - Vx/c*), (3) where £ = 1/1/(1 - VVc 2 ). These are the Lorentz transformations of special relativity. It is with these transformations that one associates the principle (really a postulate) of special relativity: "no experiment can ever detect a uniform motion through space." The principle is entirely equivalent to the statement that the laws of nature possess the same form in all frames of reference in uniform motion relative to each other. In particular, the laws of dynamics and the laws of electro- magnetism (including optics) must satisfy (and can be shown to satisfy) this requirement. The equations of transformation (3) themselves produce a variety of consequences apart from the properties of invariance which we have mentioned. (i) When the velocity V between the frames (x, y, z), (x' t y', z') is small compared with the velocity of light, as will be the case for most terrestrial applications, we find that the Lorentz transformations (3) reduce to x'=x-Vt, y' = y, z = z, *'==* (0=1). These are the old transformation rules, formerly based on the notions of absolute space and absolute time (f = t), but now to be regarded as approximations. (ii) If the equations (3) are solved for x, y, z, t in terms of x', y' , z ', f, we find x = p(x' + Vt'), y = y', z = z, t = p (f 4- Vx'/c*) and this has the same form as (3) except that V is replaced by — V. This demonstrates the perfect symmetry between the two frames. No one frame is preferred and no one frame is fixed in any absolute sense. (iii) Simultaneity has no absolute meaning in relativity. If two events (*i, y\, z\, t\), (x2, y2, z& t%) are simultaneous to an observer, i. e. t\ = t^, they will no longer be simultaneous to another observer moving with a velocity V, for it will not be the case that t\ = t{, by virtue of the Lorentz transfor- mations: t{ = P (h - Vx x lc*), U = fi{h- Vxs/c*). These times are unequal. This follows because we have xi =|= X2- (If x% = X2 the two events would be the same and questions of simultaneity do not arise.) (iv) The combination of two successive Lorentz transfor- mations is still a Lorentz transformation. Suppose a first frame is related to a second by the equations (3): 24 EXPLORING UNIVERSITY MATHEMATICS 1 x' = 0(x-Vt), y' = y, z' = z, t' = fi (t - Vx/c 2 ), iff = 1/1/(1 -V 2 /c 2 ), and the second related to a third by a velocity V: x" = P'{x'-V't'), y" = y', z" = z', t" = & (*' - V x'lc 2 ), F = 1/1/(1 - V'W), then, by eliminating x', y', z', t' between the two trans- formations, it is not difficult to show that the third frame is related to the first frame by the equations ^ = /5 "( x -P't), y" = y, z" = z, t" = 0" (t - V" xlc 2 ), where V" (as it turns out) is not merely the addition of the two velocities V+V' as one might at first be tempted to imagine, but is given by a formula V" = {V + V')l{i + VV'/c 2 ), which approximates to V + V' when the velocities are small compared with that of light. It is interesting to notice that if V = c (or V = c) then V" is also equal to c, verifying that the velocity of light can never be increased by "adding" a further velocity to it. £, e, V Fig. 2.11. SPECIAL RELATIVITY AND SOME APPLICATIONS 25 (v) Moving measuring rods appear to contract. This is a curious phenomenon, but it must be a consequence of rela- tivity. Suppose a rod of length I is fixed along the x'-axis of the moving S' system (Fig. 2.11) between the points x' = Xi' and x' = x-z' (so that x% — x{ = I). If the situation is viewed from the frame S at any time t say, the length of the rod x-2 — xi as it appears in this frame can be found from the following formulae: xi = fi (*i' + Vh% t = [it 1 + Vxi'ft?). giving xi = j3xi' + V(t- fiVx'i/c*) = fi (1 - We*) x\ + Vt, i. e. xi = Xl ' V(l -VW) + Vt. Similarly, x 2 = x 2 ' 1/(1 -V 2 /c 2 ) + VU therefore x*-xi = {x 2 ' - xi') 1/(1 - F 2 /c 2 ) = IV{1- V 2 /c 2 ). Hence the rod appears to be shortened by a factor V (1 — V 2 /c 2 ). This is known as the Fitzgerald contraction, (vi) Moving clocks appear to run slow. This is an equally curious phenomenon. Suppose a clock is fixed in the frame S' (Fig. 2.12) at the point (x', 0, 0). Let t % ' and U be the times of two consecutive "ticks" that it makes, and the interval between the ticks t% — i\ — p sec. When the -X--SF 7 —i Fig. 2.12. 26 EXPLORING UNIVERSITY MATHEMATICS 1 SPECIAL RELATIVITY AND SOME APPLICATIONS 27 situation is seen from the frame S the interval is no longer p sec, for h = fi {h' + Vx'/c 2 ), t 2 - P W + Vx'/c 2 ), and so h-h = P W - W) = P/V (1 - VV<*). (4) Hence moving clocks "appear to run slow" by a factor 1/1/(1 — V 2 lc 2 ): this is called the time dilatation effect. The phenomenon has very recently been directly confirmed, experimentally, to within a factor of about 2 per cent.f The technique adopted did not make use of the ticks of a moving clock, but of the very short waves produced by a certain moving gamma-ray source (following the discovery of R. L. Mbssbauer, which enables us to use certain atomic nuclei as very accurate frequency standards). The theoretical principles involved are very much the same as if a moving clock had been used, but, of course, the success of experi- ments on time dilatation depends on the existence of ex- tremely precise methods. The source was mounted at the circumference of a rotor, which was set spinning with a very high angular velocity. The radiation from the moving source was received at the centre by an absorber. Since the frequency of the radiation is inversely propor- tional to its period, we have where t a , t s are the periods, and v a , v s are the frequencies of the received and emitted radiation respectively. The velocity V of the source equals cor, where a) is the angular velocity f Champeney, D. C, Isaak, G. R., and Khan, A. M., Proc. Phys. Soc. 85, 5S3 (March 1965). (Their article also contains references to other recent experiments.) of the rotor and r is its radius. Thus by the binomial theorem, v a co 2 r 2 -==i-a— -. V» c z (In view of the magnitude of c, higher order terms may be neglected.) This theoretical prediction was verified. Finally, we consider some dynamical consequences of special relativity. If a particle of mass m, fixed in a frame S', is moving with a velocity V with respect to a frame S, Einstein supposes that the mass which an observer in S would measure is not m but mtV (i - VVc 2 ) = M (say), rather like the time dilatation (4); m, being the mass which an observer at rest relative to it in S' measures, is called the rest-mass. The quantity M is called the relative mass, and correspondingly MV, the momentum which S measures, is called the relative momentum. In the absence of force and non-dissipative impact, total relative mass and total relative momentum are conserved. These basic principles of relativity mechanics can be illu- strated in the following dynamical problem: suppose two spherical perfectly elastic particles move along their common line of centres and collide. If M\ and M% are their relative masses, and V\ and Vo are their velocities before impact, and the same symbols with dashes denote the respective quantities after impact, the assumption that total relative mass and total relative momentum are conserved yields the equations M ! + M 2 = M x ' + M 2 ', M, Vt + M 2 V 2 = Mi 1 VS + M 2 V 2 . 28 EXPLORING UNIVERSITY MATHEMATICS 1 If V\ and V% are known, the velocities V\' and V 2 ' after impact can be determined, since we have the auxiliary equations Mi = mi/ 1/(1 -F?/c 2 ), Mi = mi/1/ (1 - Vf/c 2 ), M 2 = m 2 /]/(l -F 2 2 /c 2 ), M 2 ' = m 2 7l/(1 -V 2 ' 2 I<?). There are two solutions. One of these is obviously V\ = V\, Vn = ^Y corresponding to the case of no collision. The other is the required solution for collision. In the approximation when V\ and V 2 are small compared with c, the equation for the conservation of mass gives, by the binomial theorem, m (1 + &F1W) + m 2 (1 + W 2 ~Ic 2 ) = m (1 + hVi'Vc 2 ) + m 2 (1 + W 2 ' 2 lc 2 ). If we multiply this equation through by c 2 , we obtain the usual equation for the conservation of energy: hmi V? + hm 2 Vf = imi Vt* + hm 2 V 2 ' 2 . This shows that the conservation of mass and of energy are equivalent and related by E = Mc 2 , the famous Einstein formula. We can see this also by expanding Mc 2 : E = Mc 2 = mc 2 / ]/(! - V 2 /c 2 ) = mc 2 + hmV 2 + higher terms in Vic. rest kinetic energy energy further correctioms In special relativity, Newton's second law is interpreted as force = - (MV), at SPECIAL RELATIVITY AND SOME APPLICATIONS 29 where M is the relative mass. What change does this new law make to existing dynamics? Virtually none for terrestrial phenomena since the velocities encountered are so small compared with the velocity of light. However, the theoretical difference the law makes to the motions of the planets round the sun, in relation to their calculated positions according to newtonian mechanics, is significant in the case of the planet Mercury, being within the fringe of what is measurable. It can be shown that on the basis of the special relativistic form of Newton's second law, the orbit of Mercury is not an ellipse with the sun as focus as classical mechanics determines, but an ellipse which is slowly but constantly rotating about the sun as focus, i. e. an elliptical rosette. The rate of rotation, given by these calculations, is found to be 1\ sec of arc per century, which is an extremely small quantity. Observationally, Mercury does possess this annomaly in its orbit, but the rate of rotation is, in fact, approximately six times as much as special relativity predicts. The added precession is taken to be a consequence and a test of Einstein's second theory, his general theory of relativity, and this derives for Mercury the observed precession to within about 1 per cent. Appendix To show that it takes longer to fly an aeroplane at a certain rate up-wind and back, over a certain distance, than it does to fly at the same rate perpendicularly across-wind and back, over the same distance. Let us suppose that the rate at which the aircraft travels in still air is c m. p. h. If the wind-speed is V m. p. h. however, it is clear that the aircraft's speed is then c — V when it is travelling up-wind, c + V when it is travelling down-wind, and V (c 2 — V 2 ) when it is travelling across-wind (in either direction). 30 EXPLORING UNIVERSITY MATHEMATICS I If we take the first case when the aircraft flies up-wind over a distance of d miles (say) and returns over the same distance, the total time of flight is d d + c-V c+V hours. In the second case when the aircraft flies across-wind and back, the time of flight is 2d V(c*-V 2 ) hours, the time of the outward and return journies being in this case equal. Provided F#=0, the first time is always greater than the second: the inequality 2dc > 2d c z -V 2 V{c*-V*) is valid because c> V (c* — V 2 ). CHAPTER 3 Some Properties of Integers and Primes S. J. Taylor Very early in life we learn to count and we build up a store of experience of the "whole numbers". We learn that we can add them, multiply them and sometimes subtract or divide, and that there is always a correct (or unique) answer for these operations. Our object in this lecture will be to try and formalize some of these notions and to see that many of the tricks of manipulation which we learnt to use withoui fully understanding them can be justified. Laws of algebra We assume that we know the set Z = {0, 1, 2, 3, . . . } of whole numbers, and that the operations of addition and multiplication give a unique answer and satisfy a + b = b + a, ab = ba; (a + b) + c = a + (b + c), {ab) c = a (be): a (b + c) = ab + ac; a + = a, a • 1 = a for all a ; if a=^0 and ab = ac, then b = c. It is worth remarking that we could derive these "laws" for 31 52 EXPLORING UNIVERSITY MATHEMATICS 1 Z from a more primitive set of axioms, but this would take much too long. Further, one can deduce other simple laws from the above. For example, the last law implies that the product of two integers a, b in Z can only be if at least one of them is zero. Order In learning to count we soon learn that some (finite) sets are bigger than others. We formalize this by saying that, of two unequal integers in Z one must be larger than the other. We write a<Cb for "the integer a is smaller than the integer b". The order structure of Z satisfies: SOME PROPERTIES OF INTEGERS AND PRIMES There is no integer between and 1. 55 if a < b, then a + c < b + c and ac <. be if a < b and b < c, then a < c; < a for all a #= 0. for all c, for all c=t=0; In fact many of the results of mathematics depend for their validity on the fact that the ordering of Z satisfies a stronger condition. Axiom. The ordering of Z is such that every non-empty subset E of Z contains a smallest element* that is, there is an element m e E with m <, x for all x in E (any ordered set satisfying this condition is said to be well-ordered)* It is interesting that this axiom cannot be deduced from the laws of algebra together with the fact that Z is ordered. The reason for its importance is that it implies the validity of the principle of induction. Before stating this formally let us make an easier deduction: For suppose that there were at least one c in Z with < c <C 1, then the set E of such c would be non-empty. By the well-ordering axiom E has a least member m and < m <C 1. But then, multiplying this inequality by m, we obtain < m 2 < m < 1 so that m 2 also belongs to E and is smaller than m. This contradiction establishes the truth of the statement. Mathematical induction We must start by formulating carefully the method of induction which we use frequently. An essential ingredient of such a proof is that we have a series of statements depending on the integer n belonging to Z. If H n is such a statement and (i) Hj t is true, (ii) the truth of H n implies the truth of H rt +i for every n in Z such that n > k; we want to deduce that H n is true for every n ]> k. For example H n could be the statement 2«>n; by proving (i) and (ii) for this statement with k = we would like to say that it is true for every integer n in Z. If we call this method of proof the principle of induction, then The principle of induction is valid. For. suppose the conditions (i), (ii) are satisfied. Let E be the subset of Z consisting of those integers n > k for which H n is false. If E is not empty, it has a least element m. Then H m is false so, by (i), m=t=fc. Hence m>k, so that m — 1 > k, since 1 is the smallest positive integer. It follows 34 EXPLORING UNIVERSITY MATHEMATICS 1 that m — 1 is not in E so that H m -\ is true and this implies the truth of H m by (ii) with n = m — 1 > k. But H m cannot be both false and true, so our assumption that the set E is not empty is not possible. Hence E is empty, so that H„ is true for all n > k. We deduced the principle of induction from the axiom of well-ordering: it is worth remarking that one could also have assumed the induction principle as an axiom and deduced that the usual ordering of Z is a well-ordering. In order to illustrate the need for care in applying the principle of induction, let us "prove" the following. False Theorem. All the people in any room have the same colour of hair. Let E n be the statement that, if a room contains n persons, they all have the same colour of hair. Since every room must contain a whole number of people it is sufficient to prove that H n is true for every n ]> 1 in Z. Applying the principle of induction (which we have shown is valid) with k = 1, then (i) Hi is true, for in a room with only one person he (or she) has the same colour hair as himself. Now assume the truth of H n and consider any room containing (n + 1) persons. Number these inmates from 1 to (n + 1) and first send person number 1 out of the room. The room now contains n people and by hypothesis they all have the same colour hair. Call number 1 back and send out person number (n + 1). The room again has n people in it — all with the same colour of hair. But now person number 1 must have the same colour of hair as persons 2 to n and the same is true of person number {n + 1). It follows that all (n. + 1) persons have the same colour of hair, and we have deduced the truth of J? n + i and established the second condition of the method of induction. By induction H„ must be true for every n. SOME PROPERTIES OF INTEGERS AND PRIMES 35 Question. What is wrong with the above proof? Hint. Check that condition (ii) is satisfied for every n > 1. Divisibility If a, b are in Z we know that the equation ax = b does not always have a solution in Z. When the equation has a solution it must clearly be unique: in this case we say that b is divisible by a or "a divides b" and write a | b. In general it is possible to have a | be without either a\b or ale. However, there is a special class of integers a for which this deduction is valid and Theorem 2 below will establish it. It is more convenient to define them by a different property. Definition. A prime is any integer p in Z other than or 1, such that the only integers a in Z which divide p are 1 and p. Everyone is familiar with the first few primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, . . .; tables have been prepared listing all the primes up to 10 7 and some very large numbers are known to be primes. However, there is a very neat proof, due to Euclid, of the theorem: There are infinitely many primes. Suppose this is false; then the set of primes is finite and they can be listed pi, P2, ■ • •, Pn- Consider the integer i = PiP2- • • Pn + 1- This cannot be divided by any of the primes pi, pi, . . ., p n - But t is bigger than any of p\, p% p n so it cannot be a prime. But any number which is not a prime must be divisible by a prime, since the smallest divisor greater than 1 cannot have divisors other than itself and 1. It follows that t must be divisible by some prime other than 36 EXPLORING UNIVERSITY MATHEMATICS 1 pu ...,p n and we again have contradicted the assumption that these are all the primes. We will return later to discuss some of the known (and unknown) properties of primes. But first we have to continue our more general discussion of divisibility. If a and b are in Z {b 4= 0) then we can always divide a by b to give a quotient q and remainder r which is smaller than b. This can be formulated as a theorem. Euclidean Algorithm. Given integers a, b in Z with b>0 there exist unique integers q, r in Z with a = bq + r, < r < b. We first prove uniqueness. Suppose q it r i} q 2 , r 2 satisfy this relation, then so that a = bqi + ri = bq 2 + r 2 & I Qi ~ Q2 I = I n — r 2 I . Thus b divides I n — rg I which is less than b, so we must have ri — r 2 = 0, whence qi = q 2 . Now let E be the set of integers of Z of the form a - bx, with x in Z. E is not empty since x = gives the integer a in Z. By the well- ordering axiom the set E has a least member r corresponding to an integer q in Z. Then a = bq + r and we have only to show that 0^r<b. U r ^ b, then a - b (q + 1) would be a smaller integer in E contradicting the definition of r. Hence the relation is completely established. Greatest common divisor Given two integers a, b in Z we call cf the g. c. d. of a and b if d is a divisor of both a and b, and any x in Z which divides both a and 6 also divides d. Thus some properties of integers and primes d\a, d\b and x I a, re I £> imply .x I d. 37 Note that the adjective "greatest" really means that d is a multiple of any other divisor. Theorem 1. Any two positive integers a, b have a unique g. c. d. in Z, denoted (a, b). There are integers u, v (not unique, and not both positive) such that (a, b) = ua + vb. This can be proved by repeated application of the euclidean algorithm. (I believe that it is usual nowadays to teach people to find (a, b) by factorizing a and b into prime factors — this is illogical as the method can only be justified if one has already proved this theorem or something equivalent.) Applying the algorithm we get a = bqi + r u (0<ri<b). Now if an integer x divides both a and b it must also divide n, and similarly if x divides b and ri it must divide a: it follows that (a, b) = (b, n). Repeat the argument on b and ri if ri =}= 0, b = n q 2 + r 2 , (0 < r 2 < ri). As the remainders decrease by at least 1 each time, the process must ultimately end with a zero remainder after a finite number of steps: r»-2 =■ r»-iq n + r n , (0 <, r„<r„_i); T n -i = r n q n + h It is clear that each of the pairs a, b; b, ri; n, r 2 ; . . ,;r„-i, r n have the same set of common divisors: (a, b) = (r n -u r n ) = r n . 38 EXPLORING UNIVERSITY MATHEMATICS 1 SOME PROPERTIES OF INTEGERS AND PRIMES 39 The g. c. d. must therefore be the last non-zero remainder in this (finite) process. The uniqueness for the g. c. d. follows from the fact that if d\, d-z both satisfy the definition for g. c. d., then d\ I lis and dz | d\, which implies d\ = cfe. The evaluation of integers u, v such that (a, b) = ua + vb can also be carried out from the system of equations, by successive substitution giving each r,- in turn as a linear sum u<a + v%b: r\ = a — bqi = a + (- qi) b, rg m b — q 2 n = (— q 2 ) a -f (1 + qiq 2 ) b, etc. The existence of a g. c. d. enables us to obtain the most important property of primes: Theorem 2. For any prime p, if p | ab then p\a or p\b. Suppose p | ab but p is not a divisor of b, then the only common divisor for p, b is 1, and so (p, b) = 1. This means that, for suitable integers u, v 1 = up 4- vb. Multiplying by a gives a = upa + oab, and it is clear that p divides both terms on the right-hand side so that p j a. We have already seen (in the proof that there are infinitely many primes) that any positive integer which is not a prime is divisible by some prime. This allows us to express any integer n > 1 in Z as a product of a finite number of primes. We can now prove: Theorem 3. The expression of an integer n > 1 as a product of primes is unique apart from the order of the factors. Suppose we have two factorizations into primes: li = P\ P2 • • • Pn = q\ 32 . • • q m - Then since p\ | qi (q 2 . . . q m ) either p\ | q\ or pi | q 2 (g» . . . q m ). by Theorem 2. By repeated application p\ | qi or p\ | q 2 or Pi | <Z3 (q4 • • • qm) ; and so on. This means that there is some fci for which pi | q,k v But then we must have p\ = q/^and this term can be cancelled from the product leaving P2 P3 • • • Pn = q2 qs' ... q m ', where the right-hand side consists of the remaining factors qi. The argument can now be repeated showing that each of the factors pi, . . ., p„ occurs among the primes qi. After they have all been cancelled there cannot be any q's left. Hence n = m, and the factors q,- are just a rearrangement of the factors p<. The theorems we have proved are all (with the possible exception of (a, b) = ua + vb) well known to any schoolboy in the sense that he believes them and uses them, even if they are not formulated in his mind. Thus all we have done so far is to indicate the sort of arguments which can be used to justify common arithmetical techniques. It is gratifying to know that these are valid techniques since most of us have found them very useful in practice. The reader might like to evaluate the g. c. d.'s (180, 252), (1001, 7655) by the process of Theorem 2 to convince himself that this is a sensible method. Let us now consider some other properties of primes which either cannot be proved easily or constitute unsolved problems. 40 EXPLORING UNIVERSITY MATHEMATICS 1 Distribution of primes We have already shown that there are infinitely many primes. These can clearly be written clown in increasing order of magnitude to give an infinite sequence of prime numbers, Pi, P2, ■ • •> Pn, — If one examines this sequence by looking at a table of primes one forms certain impressions: (i) the sequence looks irregular if one only considers a relatively small part of it at once; (ii) prime numbers become rarer among the large integers and one feels that if an arbitrary integer greater than, say a million, is considered, its chance of being a prime is very small; (iii) if one counts the number of primes in large blocks, say between 2 k and 2 k+1 for k = 5, 6, . . ., one obtains a sequence of integers which is fairly regular, and does not seem to tend to zero; (iv) no matter how far one looks in the table there appear to be (occasional) "prime pairs", that is integers n such that both n and (n + 2) are primes. We can formulate the observations (ii) and (iii) precisely: the status of (iv) is quite different — it constitutes a famous unsolved problem of number theory and is striking because it can be stated so simply: Are there infinitely many positive integers n such that both n and (n + 2) are primes? and yet we arc still nowhere near a solution. Incidentally it is possible to prove that any arithmetical progression of the form a, a + d, a + 2 d, . . ., a 4- kd, . . ., where (a, d) = 1 (why SOME PROPERTIES OF INTECERS AND PRIMES 41 is this condition essential?) contains infinitely many prime numbers. Since the sequence of primes is completely determined (though it is not possible to give a simple arithmetical formula which yields only primes, let alone gives all the primes) we can define a function by / (n) = number of primes p <L n. Since there are infinitely many primes we know that f (n) -> oo as n — » oo, but our observation (ii) above leads us to suspect that /' (n) is small compared with n for large values of n. The density of primes among the first n integers is d {n) = /(«) and can be computed for particular values of n. It can be proved without too much labour that d (n) — > as n -> oo, so that, for very large n, if we pick an integer at random among the first n integers it has only a small chance of being a prime. Actually, although d (n) -» it does so rather slowly, for n = 10 9 , d (n) is approximately ^. We observed in (i) that the distribution of individual primes among the integers is extremely irregular. However, this irregularity "in the small" disappears if we look instead at the average distribution given by d (n). The very regular manner in which d (n) — >0 as n ~> oo is one of the most remarkable discoveries of mathematics. If log n denotes the logarithm to base e of the integer n, Gauss already noticed that d (n) is very close to 1/log n and that the closeness of the 42 EXPLORING UNIVERSITY MATHEMATICS 1 SOME PROPERTIES OF INTEGERS AND PRIMES 43 approximation appeared to improve as n increased. If we measure the goodness of the approximation by r(n) = d(n) 1/log n then r (10 3 ) = 1159, r (10 6 ) = 1084, r (10 9 ) = 1053. Gauss conjectured early in the nineteenth century that this ratio r (n) -* 1 as n -» °o, but he was unable to prove it. This result is known as the prime number theorem. Although the result can be stated using only relatively simple concepts, it was almost a hundred years before analysis was developed sufficiently to provide the tools for a proof. (The first proofs were given in 1896 by Hadamard and de la Vallee Poussin.) The first proofs used elaborate machinery of complex function theory, but more recently so-called "elementary" proofs have been devised. These are still too difficult to be included in the average honours course for a university mathematics degree. of "very large". If the Goldbach conjecture is correct then it follows easily that every integer n is the sum of at most 3 primes. Nothing in this chapter has been original. I have only tried to do two things. Firstly, to indicate how the usual familiar arithmetical processes can be justified. Secondly, to illustrate that difficult theorems and unsolved problems can be stated using very simple concepts. References for further reading BlRKHOFF, G. and MacLane, S., A Survey of Modern Algebra, Macmillan, 1941. COURANT, R. and ROBBINS, H., What is Mathematics?, Oxford, 1948. Goldbach conjecture In 1742 in a letter to Euler, Goldbach observed that, for every case he tried, each even number, other than 2, could be expressed as the sum of two primes. He wanted to know if this was true in general. We still do not know today though very strenuous efforts have been made on this problem. The cause of the difficulty is that primes are defined using multiplication, while the problem involves addition. However, there has been some progress towards attacking the problem in recent years, and it is not hopeless to expect a solution in a finite time. For instance Vinogra- doff showed that for all very large integers n, it is possible to represent n as a sum of at most 4 primes, but his proof is an existence proof and cannot yield a precise definition CHAPTER 4 Waves P. Chad wick 1. Introductory remarks When interviewing candidates for university admission I am often asked the question "What is applied mathematics at the university all about?" My stock answer is that applied mathematics at the university continues to be largely concerned with mechanics, but that we now look beyond the motions of particles and rigid bodies to the mechanics of materials such as gases, liquids and real solids which deform when acted on by forces. This is all very well but it gives no idea of what kind of mathematics is involved in these more advanced studies, nor does it indicate the enormous variety of physical situations which the mathematics is able to portray. At the end of a 20 minute interview a short answer is perhaps excusable, but today I find myself with a generous allocation of time and a large captive sample of the kind of people who turn up at interviews. In order to try to convey to you some idea of the flavour of applied mathematics at the university level I propose to discuss in some detail a topic which crops up repeatedly during the three-year duration of the typical course. This is the theory of wave motion, and the reason for its frequent appearance is simply that whenever a material body is deformable a disturbance produced locally spreads out through the body as a wave. 44 WAVES 45 Of course waves are a common part of our everyday experience. We have all enjoyed generating water waves in the bath and idly observing waves by the sea-side. On analysing these experiences more closely we realize that light waves are involved in conveying the undulations of the water surface to our eyes, and sound waves in enabling us to hear the noise which accompanies the motion of the sea. From time to time we read or hear of earthquakes in which waves travelling through the earth's crust cause vibrations of the ground capable of demolishing buildings and other structures. Recently radio waves from outer space have been in the news, while terrestrial radio waves are the means of wireless and television broadcasting. We are all, then, familiar with waves and in particular the crucial role which they play in communications. In your case, moreover, this is an informed familiarity because you have all had physics lessons in which certain important properties of waves have been discussed and explained. You know, for instance, that waves can be reflected and refracted, can exhibit interference and diffraction, and can sometimes stand still as in an organ pipe or on the strings of a guitar. But the applied mathematician's approach to waves is rather different from that of the physicist in that he is interested in building up a general mathematical description of wave phenomena rather than in giving so-called physical explanations of particular wave properties. The operative word here is general. It is characteristic of the mathematician that he seeks to place his results in the most general possible setting. In this way he is often able to show that seemingly diverse ideas have a common basis. Increasing generality and the development of unified theories are marked features of university mathematics, and I hope that my later remarks will show you how these tendencies operate in the particular case of wave theory. Before turning to some actual mathematics, mention should be made of the considerable impact which the study of 46 EXPLORING UNIVERSITY MATHEMATICS 1 wave phenomena has had on the historical development of mathematics. The classical theory of waves is largely the creation of the great mathematicians of the eighteenth and nineteenth centuries, but during the present century there have been revolutionary developments in theoretical physics in which the wave concept is of fundamental importance; and progress has been made in the study of what are called non-linear waves. The best known example of a non-linear wave is the shock wave. Shock waves are produced in the air by explosions and when an aeroplane accelerates until its speed exceeds the speed of sound. These supersonic bangs are also much in the news at the present time, but in this chapter I shall consider only the more domesticated linear waves. 2. Simple harmonic oscillations In order to fix a few ideas I would like first to say something about simple harmonic oscillations. Let us con- sider three examples. (a) Fig. 4.1 (a) shows a particle P of mass m attached to one end of a light elastic spring. The other end of the spring is fixed and the whole system (i. e. the spring plus the particle) is assumed to move in a fixed horizontal straight line. Let the spring have natural length / and elastic modulus X, and let the displacement of P from its position of equilibrium be £. Then the equation of motion of P (i. e. Newton's second law of motion applied to P) is dH X where t denotes time. (b) In Fig. 4.1 (b) the particle P (again of mass m) is free lo slide on the smooth curve C which is fixed in a vertical plane. Let the tangent to C be horizontal at the lowest point O and suppose that P is released from rest at a point WAVES 47 on C close to O. Then if s denotes the arc distance OP and R is the radius of curvature of C at O, the equation of motion of P is, to a first approximation, d s s mg "> ~r~ = — ir s> dt* R g being the acceleration due to gravity. \-iumrmrmmrmr- • 9 m 111 (C) equilibrium position of P Fio. 4.1. Systems executing simple harmonic oscillations. (c) The third example (see Fig. 4.1 (c)] concerns a uniform circular disc attached at its centre O to a vertical wire the upper end of which is fixed. We suppose that the disc is mounted in a horizontal plane and is rotated slightly from its equilibrium position before being released from rest. It 48 EXPLORING UNIVERSITY MATHEMATICS 1 then proceeds to execute torsional oscillations governed by the equation d 2 /— = -DO, dt* being the angular displacement of the disc from its equilibrium position. The constants / and D appearing in this equation are respectively the moment of inertia of the disc about the axis of rotation and the torsional rigidity of the wire. These three examples have two features in common. (i) The motion is completely determined when a single variable (f, s or 6) is known as a function of the time t. (ii) This dependent variable in each case satisfies a differential equation of the form d^_ dt* + w 2 £ = 0, (1) where w is a constant characteristic of the physical system. Thus we can formulate the following statement. Whenever the state at time t of a system is specified by a single number which is determined by solving a differential equation of the form (1), then that system is performing simple harmonic oscillations. In the above statement the concept of simple harmonic oscillations is associated with a purely mathematical idea, ! WAVES 49 namely the differential equation (1). This idea is essential to all simple harmonic oscillations irrespective of the particular physical context in which they occur. Had we defined simple harmonic oscillations on the basis of acceler- ation being proportional to distance, our definition would have had to be worded so as to include linear acceleration [example (a)], acceleration along a curve [example (b)] and angular acceleration [example (c)], and even then it would apply only to the oscillations of a mechanical system. But | in equation (1) could represent current in an electrical circuit or, say, the brightness of a star at a given time. We have here, therefore, an example of the way in which the mathematician relates effect occurring in different physical situations to a central mathematical idea. In fact we have set up what might be called an abstract mathematical theory of simple harmonic oscillations which consists of the study of the solutions of equation (1). As is well known, assuming that o) 4= 0, the most general solution of equation (1) is f = a cos {oyt + a), (2) i where a and a are arbitrary constants. In other words, every solution of equation (1) can be obtained from the expression (2) by assigning suitable values to a and a. For a particular oscillation, equation (1) must be supplemented by two subsidiary conditions which serve to determine a and a. These conditions usually consist in specifying the values of f and dtj/dt at a particular time, normally t = 0. In physical terms this amounts to stating the manner in which the oscillation is excited. The basic differential equation, then, together with two subsidiary conditions determines a unique solution of the form (2). a is called the amplitude of the oscillation and a 50 EXPLORING UNIVERSITY MATHEMATICS 1 the phase angle. The period, 2ti/co, depends entirely upon the nature of the system and not upon the manner in which it is made to oscillate. 3. Oscillations, waves and differential equations From what has been said about simple harmonic oscillations I hope that you will now understand what is meant by saying that, to the applied mathematician, an oscillation is something which involves the solution of one or more differential equations. The solutions are functions of time and they usually have a periodic character like the result (2). This leads us to expect that the mathematics of waves, which I mentioned earlier, will also be, basically, a matter of solving differential equations. Thus the whole subject of oscillations and waves derives its mathematical framework from the calculus which, after all, has been described (by E T. Bell) as the "chief instrument of applied mathematics". The mathematical distinction between an oscillation and a wave is that whereas oscillations are governed by ordinary differential equations, waves are governed by partial diffe- rential equations. Now before going any further we must be quite sure that we understand what these terms mean. In the differential calculus we meet functions of a single variable [e.g. f {x)\ and study, among other things, properties of their derivatives (e. g. dfldx, d 2 f/dx 2 etc.). Later (usually at university) we encounter functions which depend upon two or more variables, for example q> (x, y). Here we need to know values of both x and y before we can determine rp. Now this function, <p (x, y), can be differentiated in more than one way. We can imagine that y is a constant and then differentiate q> as if it were a function of x only — this derivative is written dcpldx — or we can treat x as a constant and differentiate with respect to y, obtaining in this case dcpldy, dcpldx and dcpldy are called the partial "WAVES 51 derivatives of <p f and we can also define partial derivatives of the second and higher orders. To take a simple example. If cp (.v, y) = x 2 y + xy s , dw d<p then -J- = 2xy + y* and — = X s + 3 xy*. dx Oy A differential equation is an equation which contains deri- vatives: it is said to be a partial differential equation if it contains partial derivatives; otherwise it is called an ordinary differential equation. Equation (1) is an ordinary differential equation and we shall now expect to set eyes on a partial differential equation as soon as we embark on a mathematical treatment of waves. 4. The wave equation Having prepared the ground let us now try to arrive at this wave equation as rapidly as possible. We shall do this by constructing a mathematical representation of a simple type of wave. According to the Shorter Oxford English Dictionary, a wave consists of "each of those rhythmic alternations of disturbance and recovery of configuration in successively contiguous portions of a body or medium, by which a state of motion travels in some direction without corresponding progressive movement of the particles succes- sively affected". This is a heroic attempt at the hopeless task of embracing in a single statement all the possible meanings of the term "wave". However, we single out for our present purposes the idea of a wave as a disturbance travelling in some fixed direction. Consider a "waveform" travelling in a fixed direction with constant speed c. We set up two sets of axes with origins O and O' as shown in Fig. 4.2 (a) and (b), O being fixed and O' moving relative to it along the x-axis with speed c. Thus O' 52 EXPLORING UNIVERSITY MATHEMATICS 1 moves with the waveform which, relative to the axes with origin O', can be represented by an equation of the form <P = f(X), (3) X being distance measured from O' in the direction of propagation. The function / specifies the wave profile. rig. 4.2 (a) depicts the situation at time to, and we suppose that, at this instant, OO' = a. Then the relation between (a) time t, ID) lime •a + cn-t, — X Fig. 4.2. Representation of a travelling waveform. the distances x and X at time U, is X = x -a. Fig. 4.2 (b) shows the position of the waveform at time t {> t ). In the interval t — to which elapses between the two states, O' moves a further distance c (t — t ) away from O so that, at time /, X = x~a-c(t- t ). (4> Combining equations (3) and (4) we obtain the following representation of the travelling waveform at time t: WAVES <p = f [x — a — c(t — to)}- 53 (5) Now the constant a is at our disposal and so we can set a = cto: this simply means that O and O' coincide at time t = 0. Equation (5) then simplifies to <p (*, t)=f(x- ct), (6) the notation on the left-hand side indicating that the function <p representing the waveform depends upon distance x in the direction of propagation and time t. There is no need for us here to associate <p with any particular physical quantity. Equation (6) is the required result. For any function / it represents a wave propagating with constant speed c and without change of shape in the positive .^-direction. By an exactly similar argument a representation of a wave propagating in the same manner in the opposite direction is <P (x, i)=g(x + ct). (7) Now what we are really seeking is a partial differential equation which has the expressions (6) and (7) as solutions. In order to find this equation we work out some partial derivatives of <p. From equation (6), <p = f {X), where X = x — ct, and so dcp Jx~ Thus df dX ~dX dx df d<p df dX = dX ~dt ~"dX~dt d s q> ~dx~* d*f d*cp d 2 f ~ dX*' ~dt* " dX 2 1 dfy d 2 (p c* dt* dx°- = — c dX (8) 54 EXPLORING UNIVERSITY MATHEMATICS 1 and it can easily be shown that the expression (7) also satisfies this equation.f We have therefore arrived, by a direct but essentially intuitive argument, at what is called the one-dimensional wave equation (8). Equation (8) is said to be one-dimension- al because <p depends upon time t and a single coordinate x. 5. Waves on a taut flexible string Now that we know what the wave equation looks like we can set about showing, in a more or less respectable way, how it governs a particular type of wave motion. Probably the simplest physical system which admits wave propagation is a stretched string, and we proceed on the basis of the following assumptions: (i) The string is uniform and perfectly flexible; (ii) The weight of the string is negligible in comparison with the tension; (iii) Fluctuations in the tension due to the motion are negligible; (iv) The string suffers only small deviations from its equilibrium position. In view of assumption (ii), the equilibrium position of the siring is a straight line and we take this to be the x-axis. The displacement cp of the string from its equilibrium position will, in general, vary with time and from point to point on the string. Thus cp is a function of * and t. Consider an t It will be observed that the expression (6) satisfies the partial differential equation _ 1^.— &!£ c dt~ dx ' which is simpler than equation (8). But this partial differential equation is unacceptable because it is not satisfied by the ex- pression ("). WAVES 55 element PQ of the string of length ds and denote by tp and tp + dtp the angles which the tangents to the string at P and Q make with the x-axis (see Fig. 4.3). Because of assump- tions (i), (ii) and (iii), PQ is subject to no forces other than Fig. 4.3. Displaced configuration of a taut flexible string. the constant tension T acting tangentially at P and Q. The equation of motion of the element perpendicular to the x-axis is therefore gds —^- = T sin (tp + dip) — T sin tp dt* = T cos?/; dip, where g is the (constant) mass per unit length of the string. Dividing by Tds and proceeding to the limit ds — » 0, there followsf p d z cp dtp cos tp — — — = cos tp — T a/ 2 r ds R (9) R being the radius of curvature of the displaced configuration of the string. Finally, we express assumption (iv) in the t It should be verified that the terms neglected do not affect the limit. 56 EXPLORING UNIVERSITY MATHEMATICS 1 following more precise form: at all points of the string and at all times the angle y is small. Then 1 _ d*q> and with these approximations equation (9) becomes 1 d s <p c 2 dt* d*-(p where c 2 = Tig. Thus the transverse displacement cp satisfies the one-dimensional wave equation (8). Having shown earlier that the expressions (6) and (7) are solutions of equation (8), we can now assert that, subject, of course, to the four assumptions which I have listed, waves can travel along a taut flexible string in either direction without changing their shape and with constant speed V (T/q). A system which transmits waves of arbitrary form without distorting them is said to be non-dispersioe. In order to examine the wave motion of a stretched string in more detail, we now take the very important step of observing that when we add together the functions / and g appearing in equations (6) and (7) we get a further solution of equation (8): that is the expression q> = f(x-ct)+g(x + ct) (10) satisfies equation (8). Now this means that waves can travel along our string in both directions at the same time, each wave propagating exactly as if the other did not exist. Equation (10) tells us that the displacement of the string at a given point is obtained by adding together the displacements produced by the two waves propagating by themselves. Mathematically this result is a consequence of a general WAVES 57 property of equation (8), namely that if we have a number of solutions then their sum is also a solution. This property, which is called the principle of superposition, is one of (he corner-stones of the mathematical theory of linear waves. So far we have not had occasion to specify the form of the functions / and g, nor has anything been said about what happens when a wave reaches an end of the string. Let us now consider a case in which / and g both represent sinu- soidal waves by putting / (x - ct) = ha sin{k (x - ct)),g (x + ct) = \ a sin{fc (x + ct)}. Here a and k are constants, and the reason for including the factor h will appear in a moment. In view of equation (10) we can now state that a solution of the wave equation (8) is <p = ha [sin{fc (x - ct)}+ sin{fc (x + ct)}\ = a sin foe cos kct. (11) FlG. 4.4 Standing wave on a taut flexible string. To see what kind of disturbance of the string this represents, first take t to be constant; that is take an imaginary snapshot of the string at a certain instant. At this instant the string is in the form of a sine wave (see equation (11) and Fig. 4.4), and we see that there are certain points, called nodes, at which the string is permanently undisplaced. Next take x to be constant: that is choose a particular point P on the string. It follows from equation (11) that P executes simple harmonic oscillations about its equilibrium position Po. The overall picture which emerges, therefore, is that of a standing wave, and we conclude that the superposition of two sinu- 58 EXPLORING UNIVERSITY MATHEMATICS 1 soidal waves of equal amplitude travelling in opposite direc- tions gives a displacement pattern of a stationary (i. e. non- progressive) character. Referring to equation (11) we see that the positions of the nodes are given by sin kx = 0. The non-negative solutions of this equation are n 2n *-*• *• T so that the nodes are equally spaced at intervals of nlk. Now if we fix the string at a node, then cut it immediately to the left of the fixed point and throw away the left-hand portion of the string the remainder goes on vibrating exactly as before. In other words, conditions at a node are exactly the same as conditions at a fixed end of the string. This means that a standing wave can be maintained on a string of finite length / fixed at its ends provided that / is an integral multiple of nlk. But the constant k is at our disposal, so it follows (after a little reflection) that a standing wave of amplitude a with n + 1 nodes on a string of length I is represented by the solution (11) with k = null. Standing wave configurations with 2, 3 and 4 nodes are shown in Fig. 4.5. Although these remarks on standing waves have referred specifically to a taut string fixed at its ends, we can state the end-result in purely mathematical terms. We have shown that solutions of equation (8) satisfying the subsidiary conditions g> = when x = and when x = I are given by nnx nnct <p = a sin — 7- cos — ; — , where n = 1 , 2, 3, . . .. / / WAVES 59 Now in a university course these solutions would probably be found by applying to equation (8) a technique (called the method of separation of variables) which yields other solutions capable of satisfying different subsidiary conditions. In this way an abstract theory for waves satisfying the one- First harmonic (tunda mental )k =vl\ Second harmonic, k = 2*71 Third harmonic, k = 3w/ FlG. 4.5. Standing wave configurations with 2, 3 and 4 nodes. dimensional wave equation is built up, and this theory can at once be brought to bear on the physical situations from which equation (8) emerges. In the remainder of this chapter I shall take a brief look at two such situations. 6. Sound waves Sound waves are waves of compression, and since all real materials undergo changes of volume when subjected to pressure, sound waves can propagate in gases, liquids and 60 EXPLORING UNIVERSITY MATHEMATICS 1 solids. The situation in respect of solids is rather different, and more complicated, than for gases and liquids, however, and we shall defer consideration of elastic waves until later. Suppose first that we have a tube containing a gas, say air, and propagate a wave along this column of air, perhaps by striking one end of the tube. Then u, the displacement of a typical point of the gas from its equilibrium position, is a function of x, distance measured along the tube, and time t. It can be shown, by writing down the equation of motion of an element of the gas, that If satisfies the one-dimensional wave equation 1 d 2 u _ d 2 u c 2 dt 2 ~ dx* ' where c, the speed of sound in the gas, is given by c* = k/g . Here k is a suitable bulk modulus and go is the density of the gas in its undisturbed state. By arguments with which you are no doubt familiar, the appropriate bulk modulus is given by k = ypo, where y is the adiabatic index and po the pressure of the gas in its undisturbed state. Although waves on a taut flexible string and sound waves in a tube are governed by the same partial differential equation, they differ in one important respect. The string is everywhere displaced at right angles to itself, whereas the passage of a sound wave through a gas is accompanied by motion of points of the gas in the direction of propagation of the wave. Thus we say that sound waves are longitudinal waves, while the waves on flexible strings which wc have discussed are transverse waves. Now, of course, gases are not always confined to tubes, and it is a matter of common experience that sound can WAVES 61 travel through an unconfined body of gas such as the open air. The question therefore arises: "What is the equation which controls the propagation of sound in the most general circumstances?" The answer is the three-dimensional wave equal ion 1 d 2 <p d 2 <p d 2 (f ~ dx 2 c* dt 2 d 2 q> dy 2 dz* ' (12) w v displacement vector FlG. 4.6 Three-dimensional rectangular cartesian coordinate system. where x, y, z are rectangular cartesian coordinates (sec Fig. 4.6). The dependent variable <p in equation (12) is a function of the four independent variables x, y, z, t, three of which are held constant when calculating one of the partial derivatives. <p does not represent the displacement of a typical point of the gas (which is now a vector quantity), but the displacement can be found when <p is known. The three-dimensional wave equation (12), besides being the basic differential equation in the theory of sound waves, enters into the theory of electromagnetic waves, such as 62 EXPLORING UNIVERSITY MATHEMATICS 1 light and radio waves. There is also a two-dimensional wave equation, obtained by deleting the final term of (12), which governs the transverse motion of a stretched membrane situated, when undisplaced, in the (x, i/)-plane. 7. Elastic waves Finally, we come to elastic waves under which heading, as mentioned earlier, we include sound waves in solid materials. The property of elasticity has to do with the ability of a solid body to deform in response to applied forces and then to recover its original form immediately those forces are removed. A very wide range of solids exhibit this property so long as the applied forces are not too large: examples arc the common metals and alloys, rock and concrete. Now a solid can be deformed in different ways. Like a gas or liquid it can be compressed, and so has a bulk modulus k which measures the volume change produced by unit pressure. Unlike other forms of matter, however, a solid can sustain forces tending to shear, or twist it, and the stress required to produce unit shear is measured by the modulus of rigidity ft. Thus two material constants, k and //, are needed to specify the elastic properties of a solid, and our earlier results lead us to expect that these constants, together with the density g, will determine the speed with which waves propagate through the material. The displacement suffered by a typical point of an elastic solid during the passage of a wave is a vector quantity and so has three components u, v, w (see Fig. 4.6) each of which, in general, is a function of the four variables x, y, z, i. It follows that wave motion in an elastic solid is governed not by a single partial differential equation but by three. These equations turn out not to be wave equations, but the follow- ing: WAVES d 2 u d (du 6d dw\ e— = (* + M— I— + — + — \ + f* dt 2 d 2 v dx \dx dy dz ) d (du dv dw\ ( (d 2 u [dx^ d 2 u d 2 u dy* dz* dt* dy (dx dy dz (d 2 v d 2 v d 2 D dx* dy* dz* d 2 w d , e-^ = ( fc + M^- T- + T- dt* dz (du do dw dx dy dz dw\ (d*w d*w d*w dz* The solution of equations (13) is clearly a formidable under- taking and has been exercising mathematicians for the last 100 years or so, but we can achieve a surprising degree of simplification by supposing that the displacement compo- nents depend only upon x and t. Equations (13) then reduce to d*u d 2 u PX" d 2 V d 2 o 9 dt* ** dx* d*w ~dt* d 2 w dx 2 (14) which are three one-dimensional wave equations. Now looking for solutions of equations (13) which depend only upon x and t is tantamount to considering the possible existence of elastic waves travelling parallel to the Ar-axis which give the elastic body the same displacement at all points on planes perpendicular to the x-axis. Since we have come to realize that whatever quantity satisfies the wave equation is propagated as a wave, we conclude from equations (14) that waves of the assumed type (plane waves) can in fact exist. Moreover, we see that there are two kinds (13) 64 EXPLORING UNIVERSITY MATHEMATICS 1 WAVES 65 of elastic waves travelling with different speeds, the two speeds being a ={(* + ! V)/Q} l/S , c 2 =(///p)V S . For all known solids, ci > c-2. Since u is the displacement component in the direction of the avaxis, the wave travelling with speed Ci is longitudinal. Elastic waves of this type are directly analogous to sound waves and are called P-waoes (P for primary or for push- pull). The waves travelling with speed c& on the other hand, are transverse and have no counterpart in gases and liquids. They are called S-waoes (S for secondary or for shake). We have shown here that an elastic disturbance of a particularly simple kind travelling parallel to the jc-axis is composed of a P-wave and two 5-waves. In fact this result can be generalized and it can be shown that any elastic wave travelling in the interior of a solid body is made up of a P- and an S-constituent. 8. Seismology and the earth's interior To end on a physical rather than a mathematical note, I would like to indicate how the simple facts about elastic waves which have just emerged have been used in determin- ing in broad outline the internal structure of the earth. At the beginning of this chapter I mentioned earthquakes. The science which is concerned with the measurement and interpretation of earthquakes is called seismology and it is the most highly developed branch of physics of the earth or geophysics. Earthquakes occur within the outermost 500 miles of the earth, usually within 50 miles of the surface. A particular earthquake causes most disturbance at the place on the earth's surface immediately above, but in addition it generates elastic waves which travel outwards from the source of the eathquake in all directions and can be detected at points far removed from the severely disturbed area. The elastic disturbance produced by an earthquake contains both P- and S-waves which, of course, travel with different speeds. Moreover, because the density, bulk mo- dulus and modulus of rigidity of the material from which the earth is made vary with depth, so do the speeds c\ and C2, and this means that the seismic waves do not travel through the earth in straight lines but follow curved paths as shown in Fig. 4.7. In their journey through the earth Epicentre Earthquake locus Fig. 4.7. Paths of seismic waves. the waves may bounce off the surface and when this happens an incident P-wave may give rise to a reflected P-wave and a reflected 5-wave as well. It will be appreciated, therefore, that the motion of the ground recorded at great distances from an earthquake is very complicated. Nevertheless, this is the record of a disturbance which has travelled through the deep interior of the earth, bringing to the surface information about the nature of the material through which it has passed, and it is the deciphering of earthquake records (or seismograms as they are called) which has yielded much of our present knowledge about the inaccessible interior. The task of determining from seismograms the paths along which the elastic waves generated by a particular earthquake travel through the earth calls for elaborate calculations as well as a high degree of skill and experience on the part of the seismologist. During the past 70 years, however, a 66 EXPLORING UNIVERSITY MATHEMATICS 1 network of seismological observatories covering most of the continents has been built up and more and more earthquakes have been recorded and analysed. From this vast quantity ol data, tables have been computed which make it possible to reconstruct the wave paths emanating from a particular earthquake with great accuracy. The picture which emerges from this reconstruction is a striking one. Suppose that an earthquake were to occur beneath the North Pole. Then over the entire northern hemisphere and as far as latitude 15° S. (i. e. to a circle passing through central Brazil, Zambia and the north of Australia) the arrival of both P and S waves would be recorded. Between latitudes 13° S. and 52° S. (a circle through the southernmost tip of South America) hardly any ground motion would occur, and between 52° S. and the South Pole only P-waves would be recorded. The interpretation of this pattern resulted from ideas put forward in 1906 by R. D. Oldham and later followed up by a number of other seismologists. The results obtained by these workers indicate that the earth consists of a liquid core surrounded by a thick solid shell. The core, being liquid, is unable to transmit S-waves and, furthermore, is a much poorer transmitter of P-waves than the solid shell. The effect of the core on seismic waves can thus be pictured as follows. Imagine the earthquake to be replaced by a lamp emitting light of two colours, white corresponding to P-waves and red corresponding to 5-waves. The core is completely opaque to red light and so will cast a shadow at the earth's surface extending from the South Pole to some lower latitude. While not opaque to white light the core has different transmitting properties from the surrounding mantle. This means that it acts as a giant lens, focusing the white light towards the South Pole and producing a shadow zone at lower latitudes (see Fig. 4.8). The presence of a liquid core within the earth had been predicted during the nineteenth century from rather different considerations. It had been observed, for instance, that on WAVES 67 descending a mine the temperature increases steadily. This rate of increase, if extended to great depths, would imply that the central part of the earth must be molten. Mantle Fig. 4.8. Formation of seismic shadow zones. The confirmation of this earlier conjecture represents one of the most striking achievements of seismology. The solid outer shell of the earth is called the mantle and the depth of the core-mantle boundary below the surface (first determined in 1915 by B. Gutenberg) is 1801 miles.f A further major seismological discovery was made in 1936 by the Danish woman seismologist Inge Lehmann. I have stated that an earthquake occurring beneath the North Pole would cast a shadow between latitudes 13° S. and 52° S. In fact it is found that the shadow is not quite complete, weak t The mean radius of the earth is 3959 miles. 68 EXPLORING UNIVERSITY MATHEMATICS 1 P-waves being recorded within this zone. Miss Lehmann suggested that these waves could be due to the presence of an inner core which is a very good transmitter of P-wavcs (see Fig. 4.8). This explanation has been confirmed and the radius of the inner core fixed at 777 miles. It is widely believed that the inner core is solid, but this has not been conclusively proved. This brief survey shows how the study of earthquakes, in conjunction with the theory of elastic waves, yields a broad picture of the earth's interior as consisting of three major components: an inner core, which is probably solid, sur- rounded in turn by a liquid core, and a solid mantle. 9. Concluding remarks In this chapter I have tried to give some inkling of the kind of material which appears in university courses in applied mathematics and to indicate the habits of thought cultivated by the applied mathematician. I hope that 1 have not utterly depressed you by drawing the curtain aside ou some rather fearsome partial differential equations or raised your hopes too high by ending with a rather spectacular application of wave theory. References for further reading Bell, E. T., Mathematics, Queen and Servant of Science, G.Bell & Sons, 1952 (especially chapter 17). Feather, N., Vibrations and Waves, Penguin Books, 1964. HODGSON, J. II., Earthquakes and Earth Structure, Prentice-Hall 1964. Jeans, J. H., Mathematics of Music, in The World of Mathematics (ed. J. Newman), vol. 4, pp. 2278-2309, Allen & Unwin, 1960. Sutton, O. G., Mathematics in Action, 2»d edition, G.Bell & Sons, 1957 (especially chapter 4). CHAPTER 5 Square Fibonacci Numbers J. H. E. Cohn Introduction It is usually thought that unsolved problems in mathematics, and perhaps especially in pure mathematics must necessarily be "hard" in the sense that the solution, if one is ever found, will involve some essentially new idea or method. (Of course, this description of "hardness" has nothing to do with length; a "hard" proof may be very short and an "elementary" one long and complicated.) In no other field, it is alleged, is this more true than in the theory of numbers. Shortly after taking my degree, when I said to my tutor that I would like to do research in the theory of numbers, I was strongly advised against this on the grounds that the remaining problems were so hard that it was almost certaiu (hat I would fail to solve any, and in the most unlikely event of my succeeding I would deserve election to the Royal Society rather than a doctorate. I feel now that this advice was perhaps rather misguided, and I hope to show this by giving an elementary proof of a well-known conjecture in the theory of numbers, first proved in 1963. As I have said, "elementary" means that all the stages of the proof follow by simple reasoning from known theorems. At several stages I shall quote theorems or results which are well known or easily established. Proofs of these are given in the Appendix. 69 70 EXPLORING UNIVERSITY MATHEMATICS I The problem discussed here is mentioned in Tomorrow's Math, by C. S. Ogilvy, on p. 100. (This book by the way is a veritable mine of easily understood unsolved problems.) Let the numbers u n be defined for all positive integers n, by ui = U2 = 1; u„ + 2 = u« + i + u n for n > 1. The first twenty or thirty such numbers are easily calculated, after which they increase rather rapidly — ueo has thirteen digits. Of the first few numbers, it will be observed that u\ = Ui = \ and U12 = 144 are perfect squares, and there do not appear to be any others, at least for fairly small values of n. It had been conjectured that in fact there are no more, and this conjecture, although it seemed "reasonable" had so long eluded proof that a computer was used to see whether it could find any counter-example. The first million numbers u n were in turn examined and no further perfect square was found. In fact this is hardly surprising; we shall prove in Theorem 3 that there are no more at alL In the first place we observe that we can extend the definition of u n to cover the cases n <, 0, by using the same recurrence relation. It is then found that uq = and u_i = 1 and we shall prove that u n is a perfect square if and only if n = - 1, 0, 1, 2 or 12. Secondly, we shall find it convenient to consider also the Lucas numbers v n defined by vi = 1, vz = 3 and v n *2 — o n + i + v n . These, too, may be defined for all integers, positive, negative or zero. Notation At this stage it seems appropriate to explain the notation which we shall use throughout. All the symbols m, n, x, y, etc., which occur, except a and p are supposed to be integers, not necessarily positive; the symbol k wherever it occurs denotes an even integer, not divisible by 3. The symbol (x, y) denotes the largest positive integer dividing both x and y exactly. If *=t=0, then we write x\y \i x divides y exactly and x <f y otherwise. Finally, if c > 0, we write SQUARE FIBONACCI NUMBERS 71 a = b (mod c) if and only if c | (a — b). Thus for example, (18, - 27) = 9; (4, - 5) = 1; (0, - 3) = 3; 4 | 12; 12 H 16; 21 =6 (mod 5); 18 = - 2 (mod 4), and so on. Preliminaries We shall need the following result a proof of which appears in the Appendix. Lemma. 7/ N is positive, and N = 3 (mod 4) then ^ + 1/2 = (mod N) is impossible unless y and N have a common factor. From the definitions of u n and v n the following values are easily tabulated: n Un V n — 2 — 1 3 — 1 1 — 1 2 1 1 1 2 1 3 3 2 4 4 3 7 5 5 11 6 8 18 7 13 29 8 21 47 9 34 76 10 55 123 11 89 199 12 144 322 Now let a = h (1 + V5) and = i (1 - ]/5) be the roots of the quadratic equation 2 = 6 + 1. Then it is a simple 72 EXPLORING UNIVERSITY MATHEMATICS 1 matter to prove the following sixteen formulae; details are given in the Appendix. a + = lja0--i, (i) u n = 5-'/. fc» - /?»), (2) o n = a" + 0», (3) 2 U„, + n = Un V n + U n D m , (4) 2 D OT + n = 5 ll m U„ + D m D„, (5) u. n = (-l)^u n , (6) D_„= (-1)«D„, (7) *»-*^ + (-l)*-^, (8) D„ + 12 = o« (mod 8), (9) 2|o m if and only if 3 | m, (10) 3 + d w if 4|m, (11) (un, d„) =2 if 3 1 n, (12) (u„, d„) = 1 if 3 + n. (13) D A - * 3 (mod 4) if 2 | fc, 3 -f It, (14) v m + 2k= - v m (mod d a -) if 2 | fc, 3 + fc, (15) U»+2ft = - Um (mod D A ) if 2 | fc, 3 + fc, (16) The main theorems Theorem 1. d„ = .v 2 is possible only for n = 1 or 3. Proof. (1) If n is even, we have by (8) D n = D n/j * ± 2 for otherwise we would have SQUARE FIBONACCI NUMBERS (X + D n/j ) (X- Dn h ) = ± 2, 73 which is impossible, since x — D n/t and x + D„ !t are either both odd or both even; in the former case the product is odd and in the latter the product is divisible by 4. (2) If n = 1 (mod 4), then either n = 1 or n =|= 1. If n = 1 we obtain di = 1 = l 2 . If n=f=l» then n — 1 is a non-zero integer, divisible by 4. Suppose that 3 r , where r > 0, is the highest power of 3 which divides n — 1, Then we may write n - 1 = 2 . 3 r . fc, where 2 | fc and 3 + fc. Thus D» = D 1+2 3 /- fc = (- IKd! (mod D A ) = — 1 (mod Da-). by repeated application of (15) We see therefore that d„4=;v 2 , for otherwise we have x z + l 2 = (mod Dk) which is impossible by the lemma, since by (14) da- = 3 (mod 4) and, of course, da- and 1 have no common factor. (3) Finally, if n = 3 (mod 4), then we have exactly as above, either n = 3 or n4=3. In the former case D3 = 4 = 2 2 , whereas in the latter we may write n — 3 = 2 • 3 r • fc, where r ^ 0, 2 I fc and 3 i fc, and so as before v n = i?3+2.s r .A= (~ l)' r D3= - 4 (mod Da-), by repeated application of (15). Hence again d„^=x 2 , for otherwise we would have x 2 + 2* = (mod da) which is impossible by the lemma, since da- = 3 (mod 4) by (14) and so da- and 2 have no common factor. This concludes the proof of Theorem 1. Theorem 2. d„ = 2 x 2 is possible only for n = 0, 6 or — 6. Proof. In the first place, we must have d„ even, and so by (10) we can assume that 3 j n. 74 EXPLORING UNIVERSITY MATHEMATICS 1 (1) If n is odd, then since 3 | n, n must leave remainder 3 on division by 6, and so remainder 3 or 9 on division by 12, i. e. n = 12m + 3 or 12m + 9 for some integer m. Hence by repeated application of (9) we have 'l2m+9 D « = D 12m+3 0r °l = i>3 or D9 (mod 8) = 4 or 76 (mod 8) = 4 (mod 8). Thus h u n = 2 (mod 4) and so \ o n 4= * 2 for if x is odd then a- 2 is also odd, and if x is even x 2 is divisible by 4. (2) Suppose now that n = (mod 4). Then either n — or n=)=0. If rc = we have d = 2 = 2 . I 2 . If rc 4= then as before we may write n = 2 • 3 r • fc, where 2 | fc, 3 H k and r is a positive integer. Thus just as before we obtain = (-l)s r p =-2 (mod D k ). Thus D k divides {v n + 2) = 2 (J d„ + 1); now by (14) v k is odd, and so since v n is even, £ v n is an integer, and so Dn divides (h d» + 1), that is lv n + 12 = (mod d&). Now, by (14) Dk = 3 (mod 4) and so by the lemma, wc see that h o n 4= x 2 or o„ 4= 2x 2 . (3) The remaining case is n = 2 (mod 4), or n = 2 or 6 (mod 8). Consider first n = 6 (mod 8). Then either n = 6 or n 4= 6. If n = 6 then v& = 18 = 2 • 3 2 , whereas if n =}= 6, we may write n — 6 = 2 . 3 r . fc, where r is a positive integer. 3 'f fc and now 4| A\ Thus as before o n = ( — l) 3r 06 = — 18 (mod D k ). Hence D k divides r> n + 18 = 2 (i d„ + 3 2 ), and SQUARE FIBONACCI NUMBERS 75 since by (14) Dk is odd, it follows that D/ { divides {h d„ + 3 2 ), in other words, I v n + 3 2 = (mod D k ). Now by (14) Dk — 3 (mod 4), and since 4 | k, it follows from (11) that 5 ■{ Dk, in other words 3 and D/< have no factor in common. Hence from the lemma it follows that h D n =t= x 2 , in other words D n 4= 2.x: 2 . Finally, if n = 2 (mod 8) then -n = 6 (mod 8) and by (7) o tt = d_„. Thus D n — x 2 if and only if D- n = x 2 , and by what has immediately preceedcd this is possible if and only if — n = 6, that is n = —6. This concludes the proof. Theorem 3. u„ = x 2 is possible only for n = — 1, 0, 1, 2 or 12. Proof. (1) If n= 1 (mod 4), then either n = 1 or n=fcl. If n = 1 we have u\ = 1 = l 2 . If rc 4= 1 then as before we may write n - 1 = 2 . 3 r . k, where r > 0, 2 | fc and 3-ffr. Thus by- repeated application of (16) we have U n = "l + 2./.A = (- l) 3 ' Ui = - 1 (mod D k ). Thus u.„ + l 2 = (mod Dk), and so by the lemma u n =£ x 2 , since as before by (14) Dk — 3 (mod 4) and D/ : and 1 have no common factor. (2) If n = 3 (mod 4), then - n = 1 (mod 4) and by (6) U- n = Un- Thus u„ = x 2 is possible if and only if U- n = x 2 , that is if and only if — n — 1 or n = — 1, by the preceeding. (3) If n is even, let n = 2 N. Then if u n = x 2 , we have by (4) with m = n = N, r2 = U-2N = UN DN. There are now two cases; — 76 EXPL0R1NC UNIVERSITY MATHEMATICS 1 (A) If 3 + JV then by (13) (un, on) = 1 and sot we must have un — y 2 and dn = z 2 . Now by Theorem 1 the latter is satisfied only for N = 1 and N = 3; the first of these also satisfies the former and gives n = 2; the second must be rejected since we have taken 3 \ N. (B) If 3\N, then by (12) (un, dn) = 2 and sof we must have un = 2 i/ 2 and »# = 2 z 2 . Now by Theorem 2 the latter is satisfied only by N = 0, 6 or —6. Of these values, the first two satisfy the former whereas u_o = — 8 and so N = —6 does not satisfy un = 2 y 2 . Hence we obtain exactly two more values, namely n = or 12. This concludes the proof. Appendix To prove the lemma, we shall first need Fermat'S Theorem. If p is a prime and p \ z then zp- 1 = 1 (mod p). For, we observe that if p is a prime then the (p — 1) binomial co-efficients pCu p C2, p C$, . . ., pC p _i are all divisible by p, since if 1 <L r <, (p — 1), pC t is an integer which equals P(p-l)(p-2)...(p + l-r) 1.2.3...r and since p is prime, none of the factors in the denominator divides p since r <. (p — 1). Thus f Proof in Appendix. or, SQUARE FIBONACCI NUMBERS 77 (x + 1)p = xp + pCi xp~ 1 + . . . + PC p -i x + 1 = xP + 1 (mod p) {x + 1)P - (x + 1) = xP - x (mod p), and by repeated application of this we obtain z p — z = (z - \)p - (z - 1) = ... = IP — 1 =0 (mod p). Thus p divides zp — z = z (zp -1 — 1) for every integer z, and so if in addition p \ z wc have that p divides zp -1 — 1, or zP- 1 = 1 (mod p) as required. Proof of the lemma. We wish to prove // N is positive and N = 3 (mod 4) tf/ierc ac - + ,i/ 2 = (mod N) is impossible unless y and N have a common factor. Proof. Suppose if possible that iV = 3 (mod 4), that N is positive, that (y, N) = 1 and that N divides x* + y*. If N is a prime, write N = p; if N is not a prime we may write N as a product of primes p\, P2, • • ., Pk some of which may be repeated. Now since N is of the form 4 m -r 3, it follows that not all these primes p r can be of the form 4 m + 1 (since the product of two numbers of this form is also of this form), and clearly since N is odd all these primes p r are odd. Thus one at least must be of the form 4 m + 3. Write p for this prime (or for the smallest such if there be more than one). Thus in any case, we have since p divides N, p = 3 (mod 4), (p, y) = 1 and x 2 + y 2 = (mod p). Now since (p, y) = 1 it follows from Euclid's algorithm! that there exist integers a and b such that ay — bp = 1- Let z = ax. Then f See, for example, Chapter 3, p. 36. 7S EXPLORING UNIVERSITY MATHEMATICS 1 z- = a= x* = — a 2 y 2 (mod p) m - {bp + l) 2 (mod p) = — 1 (mod p), and so in particular p •f z. Also p = 3 (mod 4) and so h (p — 1) is an odd integer. Thus zP -i = ( Z 2)V* (p-D = - i ( m od p) which is impossible by Fermat's theorem. This concludes the proof of the lemma. Proof of (1), (2) and (3). We know that a = h (1 + V*>) and = J (1 — J/5) and so a + = 1; a£= - 1 (1) follows by simple calculation. Also we have by calculation. ui = 5-'/* ( a - 0) = 1 and u 2 = 5-'/t (a 2 - £ 2 ) = 1. Now suppose that for re = m, and re = rei + 1, u„ = 5 -1 /* (a n — n ). Then "m+2 = "m + 1 + Urn = 5-Vt{a m (a + 1) -/?"' (/?+ 1)} = 5- , Ma m+2 - m+s ), since a and /? are the roots of 0- = + 1. The proof of (2) for re > 1 now follows by induction. To prove (2) for re <L 0, we observe that SQUARE FIBONACCI NUMBERS 79 uo = 5-* (a - 0°) = 0, u_i = 5- , /t( a - 1 -y?- 1 ) = 1. Now suppose that for re = — rre and re = — m — 1, u» = J-"A (a n - 0"). Then u_ OT = u-m-i + M-m-2. and so U-m-2 = M-w — re_ W) _i = 5' 1 '* {a'" 1 - a-" 1-1 - 0~ m 4- y?-" 1-1 } = 5-*/* {a-" 1 " 2 (a 2 - a) - /5~ m - 2 (0* ~ 0)) = 5 -1 /* (a _m - 2 - /?~"»-2) since a and satisfy 2 — = 1. Again the result follows by induction. This proves (2); (3) follows in just the same way. Proof of (4) ared (5). We now have Um o n + u„ v m = 5- l '*{(a m - m ) (a n + n ) + (a n - n ) (a* + m )} = 2 . 5- 1 /' (a m+n - m+n ) = 2 a m + n and 5 u„, u„ + 9* o n = {a m - m ) (a n - n ) + {a m + m ) (a n + /?") = 2 (a m+n + m+n ) =2o„ 1 + n . Proo/ of (6) and (7). u_„ = 5- ,/ *(a- n -/?-") = 5- , / ! {(-^)n-(-a)»} by (1) - (- I)"" 1 v n , and D_ n = a _n + 0~ n = (-0)n+ (-„)» = (-D n P„. 80 EXPLORING UNIVERSITY MATHEMATICS 1 Proof of (8). Putting n = m in (5) we have 2 v 2m = 5 u m ! T D m 2 and putting n = — m in (5) we have, in view of (6) and (7), 4 = 2D =(-D'*- 1 (5u /n 2 -D w s). (i) Thus 2 D2m + ( — l) m 4 = 2 p TO 2 , which on rearrangement be- comes (8). Proof of (9) and (10). Putting m = 12 in (5) we obtain SQUARE FIBONACCI NUMBERS 81 thus 2 D n + 12 = 5 Uj£ U n + D12 P„ = 720 u„ + 322 p„ On + 12 = 360 U* + 161 p„ = p„ (mod 8). Thus since 03, dq, P9 and P12 are the only even p r for 1 ^ r ^ 12, it follows that 2 | p OT if and only if 3 | p w . Proo/ of (11), (12) and (13). Suppose that 4 j m, that is m = 2 M, where M is even. Then by (8) v m = dm 2 - 2, and so 3 | o m would imply that 3 | (p ro + 3) = pa< 8 + 1, and this is impossible by the lemma, since clearly (1, 3) = 1. This proves (11). To prove (12) and (13) we observe that if d = (u„, p„) then d 2 divides 5 n* 2 - o n 2 = ± 4 by (i). Hence d = 1 or 2. Also by (i) u n and v n are either both odd or both even, and so d = 1 if o n is odd, and d = 2 if o n is even, that is by (10). d = 2 if 3 I n and d = 1 otherwise. This concludes the proof of (12) and (13). Proof of (14). If 2\k and 3 + fc, then fc = 2, 4, 8 or 10 (mod 12), and so by (9), Dk — 1>2, D 4t *>8 or D10 (mod 8) = 3, 7, 47 or 123 (mod 8) = 3or7 (mod 8) = 3 (mod 4). Proof of (15) and (16). If k is even, then by (4) putting n = m = k, U2k = u k v k = (mod p a ), and by (8) v»k = o/. 2 — 2 = - 2 (mod P*), and so using (4) and (5) we obtain 2 Dm* 2 fc = 5 U m Iff* + D m D2k = ~ 2 D m (mod D/t), am 2 U OT + 2/j = Um»2A + U2AOm= ~ 2U„, (modD/t), and we have (15) and (16) on dividing by the factor 2, which is legitimate since by (10) d* is odd. Proof of statements on p. 76. Suppose that ab = x 2 and (a, b) = 1. Then suppose if possible that there exists a prime p which occurs in the factorization of either a or b (say a) raised to an odd power, 2 k — 1 say, where k i> 1. Since p | a and (a, b) = 1 it follows that p t b, and so p 2 k ~ 1 1 ab but />-' fc -I ab. Hence p 2 k_1 1 a: 2 but p 2 k % x 2 , which is impossible since if p tk ~ l I x 2 then p k | x and so p 2fc |^; 2 . Hence every prime factor o:' either a or b occurs in the factorization of a or b respectively raised to an even power and so we must have a = A* and b = B 2 . Now suppose that ab = x 2 and (a, b) = 2. Then 2 | a, 2 | b, so 4 I x 2 hence 2 I x. Let a = 2 a, b = 2b' and * = 2 *'. 82 EXPLORING UNIVERSITY MATHEMATICS 1 Then a' b' = x' 2 and now (a', b') = 1. Thus by the previous result, a = A s and b' = B 2 and so a = 2 A 2 and b = 2 B 2 , which was to be shown. CHAPTER 6 Digital Computers and their Applications M. Levison In an age of major technological advances, there can be few with such far-reaching potential effects as the development of the electronic digital computer. In a span of less than 20 years, the computer has emerged from the laboratory to become an extremely powerful tool touching the everday livcs of a large part of the population. The pay-slip of the factory-worker, the rate-bill of the householder, the house- wife's grocery list, the schoolboy's examination marks — these are but a few of the many items which at some stage in their existence may be processed by computer. One of the less happy results of the computer's phenomenal rate of progress has been the air of magic and mystery which has grown up around it. Yet there is no fundamental reason why this should be so, for the basic principles underlying it may be readily understood by the layman with little knowledge of mathematics and none at all of electronics. The present article is in two parts. The first of these describes what a computer is and explains the purpose of its various individual sections; the second outlines briefly a few of its many current applications, to give the reader some impression of the extent of the computer's vast potential. 85 84 EXPLORING UNIVERSITY MATHEMATICS DIGITAL COMPUTERS AND THEIR APPLICATIONS 85 Let us begin with a description of the computer and the five basic constituent units of which every computer is built. In order to describe the part which each of these units plays in a calculation, it will be convenient to draw an analogy with a clerk working in a company pay office, whose weekly task is to calculate the amount of pay due to each of the company's employees, given certain items of data, such as the number of hours each employee has worked that week, his rate of pay per hour, his income-tax code number, and so on. Consider then the fundamental activities which the clerk must undertake in carrying out these calculations. The first operation in evaluating each individual's pay will probably be to multiply his hourly rate of pay by the number of hours worked in order to arrive at his gross pay for the current week. Thus the clerk will have to be able to carry out simple arithmetic operations, such as addition, subtraction and multiplication. For this purpose he may be provided with a pencil and paper, or preferably, for greater speed and accuracy, with a desk calculating machine. Another task which the clerk must perform is to control the "flow" of the calculations, that is to say, he must decide which arithmetic operations to carry out at each stage, on which data the operations are to be performed, and what to do with each of the results he obtains. This he does by acting in accordance with a set of instructions which he has previously been given, perhaps by bis boss on his first day in the pay office, or perhaps (in effect) by a school mathematics teacher some year earlier. This set of instruc- tions the clerk has throughout the calculation to retain in his "memory", either literally or artificially, by committing them to a piece of paper which he can read whenever necessary. In the course of the calculation, certain intermediate results may be produced which will be required for further compu- tation at a later stage. These, too, must be recorded in the (natural or artificial) memory of the clerk. A further task which the clerk may be called upon to perform is that of taking certain very elementary decisions. In order to achie\p greater flexibility in devising the plan of a calculatioiiv it may be found necessary that the list of instructions given to the clerk should contain certain branch points. He may, for example, be told that if an employee's tax code number exceeds 50 then he is to obey one set of instructions, while if it is less than or equal to 50 then he is to obey an alternative set. The clerk must therefore have the ability to decide the relative size of different numbers. The items described above are in fact the basic ingredients of any calculation, and each of the clerk's activities has its counterpart on the computer. Like the pay clerk, the com- puter, too, must be able to carry out arithmetic operations. For this purpose all computers are provided with what is called an arithmetic unit. Pairs of numbers can be sent to this unit, and their product or their sum or their difference or their quotient can be obtained from it. It would be quite possible to use the arithmetic unit of a computer in exactly the same way as one uses a desk calculator, with the human being sitting at the machine sending numbers to the arithmetic unit and receiving the appropriate results back again. It would, however, be very uneconomical to build an expensive unit capable of adding and subtracting numbers in times of the order of 1 millionth of a second (on the most recent machines), if the operator is only going to be able to insert the numbers at a rate of about one every 5 or 10 seconds. The computer is therefore provided with its own control unit and memory unit. Into the memory unit the list of instructions (called a programme) is placed by the human operator before the calculation begins. The control unit then takes these instructions and carries them out one at a time, sending numbers when appropriate to the arithmetic unit and dispatching the results received to their correct destinations. The memory unit may be used not only for storing the instructions which the computer is to 86 EXPLORING UNIVERSITY MATHEMATICS 1 obey, but also for storing the intermediate results which the human being might write down on paper. Like the human pay clerk, the computer has an element of discriminatory ability. It, too, can compare two numbers stored in its memory, decide which is larger (or whether they are equal), and follow alternative paths of instructions according to the result obtained. So far then we have encountered three of the basic, constituent units of the computer. There are two others to consider. The first of these is its input unit, which enables the computer to receive information, both instructions and data, from the outside world. Information to be communi- cated to a human being must be passed to him by way of his sensory organs, most frequently his eyes or ears. In the case of the pay clerk, he will receive his instructions and data either in the form of the spoken word, or in the shape of a document which he may read, or possibly in braille which he can detect by touch. The sensory organs of a computer usually take the form of devices capable of detecting photoelectrically the presence or absence of holes punched in a card or a length of paper tape. The hole patterns, which may be interpreted as binary digits, are transmitted by the device to the computer memory. Now the human being is taught at an early age to recognize certain groups or patterns of sounds which make up his native language and to assign to them certain specific meanings. By using this language, any other person may readily communicate with him. In the case of the computer certain patterns of digits have a built-in meaning as instruc- tions or data determined by the computer's electronic circuits. These form the basic "language" of the computer and we may communicate information to the computer by expressing it in this language. The fifth of the basic units of the computer is its output unit, whereby it may convey information, for example, the results of its calculation or details as to how the computation DIGITAL COMPUTERS AND THEIR APPLICATIONS 87 is progressing, back to the outside world. For this purpose the computer may perhaps be made to punch further holes in cards or in paper tape, or to operate some form of electric typewriter. A functional diagram of a computer illustrating the relationship between the basic units is shown in Fig. 6.1. CMINt HI iwu: urni OUTNT UNIT Renin d CUMl'tML i l«HUl <l WttMI.C •lenliori MMMM ■ 'Uaill •• Maim *.mfr*'t I'lG. 6.1. Diagram showing the basic units of a computer, and their interconnections. In addition to the connections shown, all units receive control signals from the control unit. To gain some idea as to what is involved in expressing the plan of a calculation in the language of the computer, we return once again to our analogy with the pay clerk. Once he has calculated the gross pay of any employee, his next task is to perform a more elaborate series of calculations resulting in the deduction of PAYE income tax. Now the actual means whereby such deductions are assessed are very complicated, and to understand them fully involves a careful study of successive Finance Acts. In order, therefore, that the system may be operated by employers not expert in tax affairs, the Inland Revenue prepares a special booklet comprising a set of instructions and also some tables of figures. For each employee, there is provided a tax code number, determined by the allowances to which he is entitled, and a card divided into eight numbered columns. The instructions, expressed in a slightly simplified form, begin as follows: 88 EXPLORING UNIVERSITY MATHEMATICS 1 (1) Write the employee's gross pay for the current week in column 2 of the card. (2) Add this to the entry (made the previous week) in column 3, writing the sum back in column 3. (3) Obtain from the tables the amount af tax-free pay to date corresponding to the particular date and tax code number, writing this in column 4. (4) ... The instructions continue in a similar manner until ultimately at the end of the calculation, the amount in column 7 is the tax due for the current week, while a further subtraction gives the employee's pay after tax has been deducted. What has in fact been achieved is that the elaborate process of evaluating the income tax due has been broken down into a series of simple operations, such as recording a number on a card, adding two numbers together, and so on. The pay clerk has in effect been programmed by the Inland Revenue in much the same way that a computer must be programmed by a human user. The computer is capable of certain basic elementary operations, such as recording a number in some given cell of its memory, adding together two numbers stored in its memory, and so on. More complicated operations must be broken into steps such as these. Nevertheless, using only these very simple operations, one can build up the most complicated and complex calculations. Indeed, it is possible to programme the computer to manipulate not only numbers, but also symbols and alphabetical data, for, as we shall see later, it is quite simple to represent the latter by some numerical form of coding. The reader may perhaps be tempted to ask whether the human user might not be able to perform the calculations himself in the time taken to write so detailed a programme DIGITAL COMPUTERS AND THEIR APPLICATIONS 89 of instructions. The analogy with the pay clerk again provides the answer. It is clear that he need only be given instructions on pay calculations for one of the employees Thereafter, he may evaluate the pay of any other employee simply by repeating the same set of calculations, but with fresh data applicable to that employee: a different rate of pay, another tax code number, and so on. In fact a great many calculations contain repetitive parts like this. Fur- thermore, it is found that there are many sequences of computer instructions, carrying out some particular operation which occurs over and over again. One may, for example, want to calculate the square root of a number many times during a calculation, and in many different calculations. Exactly the same sequence of computer instructions can be used each time. Sequences of calculations of this type are known as subroutines. Over the course of time a great set of these subroutines may be gathered together, thus sub- stantially reducing the time taken to programme a new calculation. By this stage the reader should have at least a rudimentary picture of the structure and the functioning of a digital computer. Let us now turn to considering some of the purposes for which computers are being used. The bread and butter applications of computers are to be found mainly in the commercial world. We have already seen that a computer may be used to carry out pay-roll calculations, and there are many other programmes that fall under the heading of business applications. Computers, for example, are now used by many large business concerns to keep an inventory of their trading stock. This inventory is updated each day or each week, and the computer reports when the quantity of any stock item falls below a critical level so that the goods may be re-ordered. In some cases the comptitcr is also provided with such details as the cost of each item, the time delay in receiving ordered goods, and the average frequency with which the item is requested by 90 EXPLORING UNIVERSITY MATHEMATICS 1 customers. It is then programmed to determine the best balance between a heavy cash outlay in re-ordering stock and loss of trade through not having the stock available when called for. Banks, too, have begun to use computers to file and update details of their clients' accounts. Evidence of this is to be found in the appearance during recent years of curiously- shaped figures at the bottom of many cheques. These figures are carefully designed to allow them to be read directly into a computer by a special piece of apparatus, without the need for preparing punched cards or paper tape. At the present time the majority of banks use these figures to record only the code number of the branch. At least one major bank, however, has reached the stage of recording not only the branch number but also the cheque number and the customer's account number, while the amount of the cheque is also inserted once it has been used and returned to the bank. The entire processing of the cheque may then be handled by computer. It is only by adopting techniques such as this that banks will in the future be able to keep pace with the growing volume of business which they will be called upon to undertake. Calculations of the type just mentioned are often referred to as data processing. Data processing operations are characterized by the fact that a large quantity of data is fed into the computer, a large quantity of results are output, but only a fairly simple series of calculations are performed on them. Thus, although any computer whatever would be capable of performing the calculations, these would be done most efficiently on a machine with extensive input and output equipment and only a simple arithmetic unit. In contrast, there are many other problems in which the amount of input and output is light while the quantity of computation is very large. For such problems it may be more efficient to use a computer with a more complex arithmetic unit but with fewer input and output devices. Larger establish- DIGITAL COMPUTERS AND THEIR APPLICATIONS 91 ments with varied problems may, of course, possess computers having the best of both worlds. Such computers will carry out efficiently all types of problems. Computers are also used extensively for calculations in the scientific and engineering fields, including both calcu- lations hitherto undertaken by human endeavour, and calculations which on account of their size could not previously have been contemplated. To give an example, in recent years a number of individuals have been launched both by the USSR and the USA to orbit the earth in artificial satellites. None of these launchings could have taken place without the assistance of the computer at a great many points. A particularly important task for the computer occurs in the moments immediately after the rocket has been launched, for it is then essential to determine as rapidly as possible, whether the satellite will enter a satisfactory orbit. For this purpose, the position of the rocket must be carefully observed at short intervals of time during the first few seconds of flight. These positions may then be substituted into the appropriate equations, and details of the orbit calculated. Were this task to be undertaken by a team of mathematicians, a great many hours or even days would elapse before the solution was known, by which time the unfortunate astronaut might have landed far from the nearest rescue ship in the middle of the Pacific Ocean. With the help of a computer, however, details of the orbit may be determined within seconds, and the astronaut ejected from his rocket if the orbit is not satisfactory. A situation of a similar type occurs in weather prediction, where barometric observations, etc., may be used to set up equations describing the motion of the atmosphere. These equations must, however, be solved within an hour or two if the results are to have any value in forecasting weather conditions before they actually occur; so that, here again, the use of a computer is essential. 92 EXPLORING UNIVERSITY MATHEMATICS 1 In addition to their use in numerical calculations, com- puters have also been applied to the translation of languages. The reader may well wonder how a machine designed primarily to perform calculations on numbers may carry out operations on the words of a language. This may be quite simply achieved, however, by coding the linguistic data into a numerical form. We may, for example, choose to code letter A as 01, B as 02, C as 03, . . ., Z as 26. Then a word such as CAT would be represented by the number 030120, and BEAST by 0205011920. Once coded in this way, the words may obviously be stored in the computer memory and may be compared for equality or alphabetical order, just as numbers are compared for size (indeed, if we adopt the coding system just proposed and if the numbers are aligned by the left, then the alphabetical order of the words is the same as the numerical order of the numbers represen- ting them). A rudimentary form of translation might proceed as follows. In the computer memory there would be stored a long list or "dictionary" of, say, French words. In adjacent memory cells would be stored for each of these its English equivalent. Given a text in French, each word would be read into the computer, looked up in this French dictionary (possibly by comparing the text word with every dictionary word in turn until an equal one is found, though there are much quicker methods), and the English equivalent output. In this way, the computer would produce a word for word translation of the given text. Now as a translation this effort would be of little use. It is apparent, however, that the machine might go much further; for one can readily store in the computer memory not only words but other coded information, such as grammatical details, and so on. Opposite each French word in our dictionary, we might have not only its English equivalent, but also some information telling us whether this particular word is a noun or an article or an adjective or DIGITAL COMPUTERS AND THEIR APPLICATIONS 93 a verb. Instead of reading the text a word at a time and printing its translation, the computer might read an entire sentence at a time and look up each of the words in the dictionary. It might then make various adjustments to these words according to the grammatical information. If in French, for example, the computer detected a noun followed by an adjective, it could be made to invert the equivalents to their more normal English order. Close study of the translation problem reveals a number of other difficulties many of which have yet to be overcome. They form, however, an interesting field of current research. Apart from the actual translation of languages, computers have also been used in the furtherance of literary studies; to help in the preparation of the detailed word-indexes which literary scholars call concordances, to collect from texts statistical data which may be used in the study of authorship, and to assist in the jig-saw-like problem of reconstructing small fragments into manuscripts. Another problem area to which computers have been applied is the playing of games. The games concerned fall into one of two groups. For some, such as "noughts and crosses" and "Nim", a complete solution is known, either in the shape of a set of rules ensuring a win for one side, or (which amounts to much the same thing) in the form of an exhaustive analysis of all possible positions. Computer programmes to play this type of game present few difficul- ties, and are of little real interest. Rather more interesting are those games, such as draughts and chess, for which no winning system is known and for which the vast number of possible positions makes exhaustive analysis wholly impractical. The draughts-playing pro- gramme of A. L. Samuel is a good example. This plays by looking at all possible positions two or three moves ahead, and pursuing even further those positions in which develop- ments (i. e. exchanges or jump moves) are imminent. The resultant positions are then evaluated in terms of features 94 EXPLORING UNIVERSITY MATHEMATICS 1 such as material advantage, mobility of pieces, and so on, the computer selecting the move which leads to the best score available, subject to the assumption that its opponent will attempt the opposite. Two other noteworthy aspects of this programme are that it retains in its memory details of board positions encountered in play for use in later games, and that it can, in the light of success or failure, modify the weights attached to the different features used in evaluating positions. Thus, after a number of games, the programme "learns" to play a better game. Indeed such was the improvement shown in the skill of Dr. Samuel's programme, that it was able to defeat one of the leading players in the United States. The behaviour of this programme bears a close resemblance at a number of points to that of a human being. It is in fact one of a group of programmes which have been written with the intention of imitating human behaviour. Apart from their considerable entertainment value, the more serious motive behind them is an attempt to discover more about the working of the human mind. To mention but a few examples, programmes have also been written to play chess, to prove theorems in logic and pure geometry, and so on. Naturally in the limited space available this article has been able to discuss only a mere handful of the very many fields to which digital computers have been and are being applied. It is hoped, however, that this will have been sufficient to give the reader some idea of the vastness of their scope and potential, so that he may be better able to appreciate the impact which computers will make on the world of the future. Reference for further reading Samuel, A. L., Some Studies in Machine Learning using the Game of Checkers, in Computers and Thought (ed. E. A. Feigenbaum and J. Feldman), McGraw-Hill, New York, 1963. CHAPTER 7 The Isoperimetric Problem H. G. Eggleston The isoperimetric problem can be described as follows. We consider a closed curve which does not overlap itself and which lies in a plane. Fig. 7.1 (a) shows examples of the types of curve that we have in mind; Fig. 7.1 (b) D m Fig. 7.1. 95 96 EXPLORING UNIVERSITY MATHEMATICS 1 shows examples of curves that we do not consider. The curve will divide the plane into two parts of which one is enclosed by the curve and the other is not. The points in the part of the plane enclosed by the curve form a set which we refer to as the "enclosed set". We can now state the isoperimetric problem which is, If we know the length of the curve, what is the largest possible value for the area of the enclosed set? In other words we consider the class of all such closed curves which have a given length and we wish to describe, in some form or another, that curve which encloses the largest possible area. In very simple cases we can actually calculate the area; for a square, equilateral triangle or circle of perimeter length L the enclosed set has area L 2 /16, L 2 /12]/3, Dl±n respectively. But to solve the isoperimetric problem we must consider curves for which it is quite impossible to calculate the area of the enclosed set, and we need quite different techniques. This problem is an interesting one for several reasons. It illustrates in dramatic form the difference between asking a question and finding the answer, or rather of establishing the validity of the answer. The problem had occurred to the Greek geometers of over 2000 years ago but its complete solution was not discovered until the 1880's. This solution, which is that the curve must be a circle, was known to the Greeks, but it is one thing to know what the solution of a problem is and quite another to establish this solution in an adequately convincing form or, as we say, to prove it mathematically. Another reason for being interested in this problem is that although formulated in mathematical terms that are relatively concrete and close to physical reality, yet its solution depends upon abstract concepts that were not defined until comparatively modern times. Again the techniques developed to solve this problem are applicable in a wider context for attacking a whole range of problems, called by the generic title of Extremal Problems. Finally, the problem is one of a very general type in mathematical THE ISOPERIMETRIC PROBLEM 97 work, that of establishing some relation or connection between a property of a set (in this case the enclosed set), and a property of its frontier, (in this case the curve). One of the difficulties, which the Greeks came across, with this problem, was that of providing a suitable mathematical formulation. One of the difficulties in mathematics, as in most other intellectual disciplines, is knowing where to start. The Greeks started from geometrical concepts. Their basic elements were geometrical ones; points, lines, planes, and so on. Greek geometry, enunciated with such remarkable lucidity by Euclid and his fellow workers, dominated mathematical thinking and teaching in post-Renaissance Europe even until the beginning of the present century. The greatest mathematician of all time, Isaac Newton, felt that he must write mathematics in a geometrical form. He made discoveries by using his own invention, the calculus, and was able to prove results that would have been quite beyond the capacity of Euclid and the Greek geometers, but having once got his results, he felt that he had to clothe them in geometrical language. Somehow this made the most revolutionary discoveries respectable. At a later stage when Gauss had discovered non-euclidean geometries, he dared not publish these terrible heresies, because of a well- founded fear of the ridicule of his fellow mathematicians. Now, however, all is changed. Euclid is no longer a textbook at school or at the university, and we do not publish mathematical papers in geometrical form. Geometry of the euclidean type is definitely out, and non-euclidean geometry is equally definitely on the way out. We now base the whole of pure mathematics on the concept of sets. Sets of what? Well, not sets of anything very specific. Mainly sets of numbers and sets of sets of numbers and sets of sets of sets of numbers, and so on. Indeed, the idea of a numbeT can itself be defined in terms of sets. We need to make various precise assumptions about the nature of a set. When these have been made we believe that the whole of pure 93 EXPLORING UNIVERSITY MATHEMATICS 1 mathematics can be deduced from them by entirely logical arguments. It may, of course, be arguable at any one time as to what is logical and what is not, but there is generally a consensus of opinion which will agree as to what is valid and as to what is not valid, even though these ideas of validity change from time to time and what we consider to be a valid proof now, may well be regarded by our mathematical successors as being merely a primitive approximation to a logical argument. Our problem, then, has to be assessed in terms of mathe- matical concepts. We have used the expressions closed curve, enclosed set, area, length and also the idea of largest. Each of these concepts is to be translated into mathematical language in a rigorous way. When this translation has been done we can only operate mathematically on our new concepts. It is no longer permissible to use any of the physical pro- perties from which we abstracted to secure the mathematical analysis. Once we have defined our mathematical ideas we leave the physical world behind completely; we live exclusively in a mathematical world which has nothing in common with reality and in which it is completely inad- missible to do anything except mathematics. It may be wondered that in this situation, mathematics ever produces results which have any contact with the physical world at all. If the arguments that we use are divorced from physical reality, it does seem rather surprising that the results should yet have physical applications, but we can look at the matter rather in the light of a translation from one language into another. If we regard physical pheno- mena as being some sort of a language and a mathematical abstraction of these phenomena as being some sort of a translation of them, then it is feasible that the process of translation into mathematics and of the use of logical arguments in that mathematics should produce results which THE ISOPERIMETRIC PROBLEM 99 can be re-translatcd into an interpretation of the physical environment from which they were first abstracted. One of the geometers who worked on the isoperimetric problem in the last century was Jacob Stciner. He produced a number of solutions whose validity was later challenged by Weierstrass on the ground that they contained an unproved assumption. Although there was this gap in his arguments, Steiner's ideas have proved to be very important in this and allied problems. Basically Steiner's idea was as follows. Firstly, be believed (correctly) that the solution to the problem was a circle. Suppose that we denote the circle whose perimeter length is L by C then the area enclosed by C is L 2 /4jt. Consider some other closed curve of length L, say K, enclosing a set of area A. If we can show (L 2 /47i) ^> A whatever K is considered, then we shall have solved the isoperimetric problem. But this is difficult; we cannot establish this inequality directly because C and K may be of very diverse shapes and quite different from one another. Instead of making the comparison from K to C directly, let us try to go by small steps from K to C. Suppose that we could find a new curve K* of length L enclosing an area A* so that A* ^ A, and suppose further that if K is not a circle then we can find K* so that A* > A (a strict inequality), then shall we not have solved the isoperimetric problem? Steiner thought that we should, but actually there is a gap in his argument. If we were able to establish the situation described above, then we should have proved the assertion "If K is not a circle then K is not a solution to the isoperimetric problem"'. This is logically equivalent to "If the isoperimetric problem has a solution then that solution is a circle" but it is not logically equivalent to "The circle is the solution of the isoperimetric problem". One has to assume that the iso- perimetric problem actually has a solution; that is to say one has to assume that there actually is a curve that has length L and, of all such curves, encloses the largest possible 100 EXPLORING UNIVERSITY MATHEMATICS 1 area. This may not be the case at all. In fact there are similar questions which do not have a solution in this sense. For example, the Kakeya problem asks "What is the least area of a plane set, which has the property that a segment of unit length can be rotated completely round inside the set through 360° back to its original position?" In fact given any positive number s one can find a plane set of area less than e having this property, but one cannot find such a set with s replaced by zero. Thus the above question is wrongly phrased. In the Kakeya problem there is no set of least area. Perron has illustrated the gap in Steiner's argument in a striking form. We know that there is no largest positive integer or whole number. If we assume for the moment that there is such a largest positive integer then we can prove that it is 1 by an argument similiar to Steiner's. We consider any integer n and transform n to n 2 . Now n 2 ^ n and if n=j=l then n 2 > n. Hence no integer, except possibly 1, is the largest. Therefore 1 is the largest possible integer. This nonsense follows from the false assumption that there is a largest possible integer. Steiner's assumption that there was a solution to the isoperimetric problem was not false but did need to be established as correct. In fact this has been done and we use the concept of compactness to determine a wide class of problems which we know have solutions. Although Steiner's argument had this gap, his idea of approach has proved a very fruitful one in many fields. We shall now assume that the isoperimetric problem has a solution and consider how Steiner put his programme into effect. For simplicity we shall restrict our consideration to convex curves. They will be closed curves in which the direction of the line of motion (or tangent) rotates in one and the same sense as we describe the curve once. Figure 7.2 (a) gives examples of convex curves, Fig. 7.2 (b) exam- ples of non-convex curves, and Fig. 7.2 (c) a curve which, although it satisfies this condition on the rotation of the THE ISOPERIMETRIC PROBLEM 101 (ai Convex closed curves (hi Non-convex closed carves (ci A curve that satisfies Ihe condition on the sense ot rotation ot Ihe directed tangent Out is not closed Fig. 7.2. tangent, yet does not qualify for consideration, since it is not closed. A property of a convex curve that we shall assume is that a straight line either (i) does not meet it at all, or (ii) meets it in one point, or (iii) meets it in two points or (iv) meets it in a segment (Fig. 7.3). Given the convex curve K of length L Steiner's aim was to find a second curve K* of length L enclosing an area that 102 EXPLORING UNIVERSITY MATHEMATICS i Fig. 7.3. was strictly larger than that enclosed by K, unless K was actually a circle. How was he to define K* from K? He considered the properties of circles, particularly those which characterized circles amongst all convex curves, and then used the fact that K, if non-circular, did not satisfy these properties. One such property that circles possess is that J he angle in a semicircle is a right angle. We can phrase THE ISOPERIMETRIC PROBLEM 103 this as follows: if C is a circle and a, b are two points on C such that the two arcsf into which a, b divide C are of equal length then any point c on C is such that Z. abc = \n. Moreover this property characterizes circles, if a, b are given fixed points and c a variable point such that ^ abc = I n, then c lies on a circle of which a and b are diametrically opposite points. Suppose then that K is not a circle and that p and q are two points on K dividing it into two arcs of equal length then there must be a point r on one of these arcs such that ^L prq =j= \ n. (See Fig. 7.4.) Denote the two arcs of K with end points p and q by A'i and K z . Suppose that K x is that arc which contains r, and that it is itself divided into arcs K u , £12, by r, where £]] has end points p, r and K i2 has end points r, q. If we move q along the line pq to q x and r on the circle centre p radius pr to r x so that distance n q, = rq, then we can construct arcs congruent to K n joining p, rj and congruent to Ki 2 joining n, q h These two arcs form an arc Kf with a total length equal to that of £, and enclose/with segment pqu an area (say T) which differs from the area enclosed by segment pq and K x by exactly the amount that the area of the triangle pn q x differs from that of prq. If ^ prq > h 71 take qi nearer to p than q; if ^ prq <h n take g, further away from p than q. In either case if q x is near q the area of the triangle pn q\ exceeds that of the triangle prq. Thus the arc Kf has the sort of properties that we require, but of course we still have to modify the arc K z . We can do this in an exactly similar way provided that we can find a point s on it with ^ psq =|= J n and such that if ^ prq > \n then ^_ psq > In whilst if ^ prq < %n then ^ psq < hx. This could certainly not be done if K 2 was a semicircle. So Steiner put a preliminary argument in to cope with this t By arc we mean a connected part of a closed curve. It need not be an arc of a circle. Any two points on a closed convex curve divide the curve into two arcs. 104 EXPLORING UNIVERSITY MATHEMATICS 1 possibility. He observed that he could choose the points p, q on K so that (i) they divided K into two equal length arcs as before and (ii) neither of these arcs was a semicircle. To see this we have to show that if K is a convex curve with the property that when we divide K into two equal length arcs by two points p, q, then one at least is a semicircle, then K itself is a circle. Take any two points p, q dividing K into two arcs K\, K 2 of equal length. Suppose K 2 is a semicircle. If K\ is a semicircle then K is a circle. If K\ is not a semicircle there is a point t on K\ that does not lie on the circle of which £2 is a semicircle (Fig. 7.5). Let u be the point of K such that tu divides K into two equal length arcs. Denote these arcs by £3, K,\. By hypothesis at least one of K3, K 4 is a semicircle. Each of these arcs has an arc in common with £2 and therefore, if it is a semicircle, lies on the same circle as K 2 does. Thus t lies on this circle. This is a contradiction, which shows that our original assumption was false, and K was in fact a circle. Thus Steiner's subsidiary argument is proved. Given that K is not a circle we can find two points p, q dividing K into two equal length arcs K\, K 2 neither of which is a semicircle. THE ISOPERIMETRIC PROBLEM l05 Next, in order to make sure that we can pick r on K t and s on K 2 with Z.prq, </psq both greater than In or both less than hn, we select that one of K u K 2 that with segment pq encloses the larger area (if both enclose the same area take either) and reflect it in line pq. Suppose that we had selected Kt and that £1 and its reflection K\' in pq form the closed curve £3 (see Fig. 7.6). Take r on K t as before, r' its reflection Fie. 7.6. in line pq on K% f . Move q along pq and define £,'* from K\' in precisely the same way that K x * was defined from £,. The two arcs K x * and K\* together form a closed curve K 4 that encloses a larger area than that enclosed by K and is of length L. K 4 need not be convex. It may fail to be convex at only the four points p, q, n, n'. We can replace part of the curve K 4 near each of these points by a straight segment (the operation at p is shown in Fig. 7.7) to produce a convex curve £5 whose length is less than or equal to that of K A (because a straight line segment is the shortest distance between two points) and enclosing an area at least as large as that enclosed by K 4 . If the length of K 5 is L u expand K 5 by a similitude with the ratio LlL\ [transforming the point (x, y) of the plane into the point {{LlU) x, {LIU) y)] to obtain a new curve K*. 106 EXPLORING UNIVERSITY MATHEMATICS 1 THE ISOPERIMETRIC PROBLEM 107 Fic 7.7. Finally, K* is of length L, is convex and encloses an area larger than that enclosed by AT. Thus Steiner's procedure is completed. An alternative method devised by Steiner was to symmet- rize the set concerned. We make use of the fact that any con- . Fig. 7.8. vex curve encloses a convex set, and that such a set meets a straight line in either a point, a segment or in no points at all. Consider a convex curve K and a straight line X (see Fig. 7.8). For each point p on X denote by m p the line through p and perpendicular to X. This line meets the set enclosed by K in (i) no points, or (ii) one point, or (iii) a segment. Take on m„ in case (i) no points, in case (ii) the point p and in case (iii) the segment whose midpoint is p and whose length equals the length of the segment in which m p meets the set enclosed by K. We do this for every p of X. The totality of all the constructed points forms a set that is enclosed by a convex curve denoted by K . The proof of this is omitted. It can also be proved that the length of K ? does not exceed that of K, and, indeed, that the length of K x is strictly less than that of K unless K x is itself symmetric about some line parallel to X (in which case K x is a translation of K). Here "symme- tric about a line" means "coincides with its reflection in the line I shall not prove any of these statements. But if we assume that they are true, and if we assume that there is a solution to the isoperimetric problem, then the properties imply the following. If, say, the curve T is a solution of the isoperi- metric problem and we take a particular direction then there is a line l lying in the direction 6 such that f coincides with its reflection in l . Take two perpendicular directions and let the corresponding lines /,,, l v+x/t intersect at O (see Fig. 7.9). Then all the lines l pass through O. For, if this were false, select one l not passing through O and let the line through O perpendicular to this l Q meet f in p and q. Since F is symmetric about Iq one of p, q is nearer to O than the other, say p nearer than q. Since r is symmetric about l v and l v+3t/i , is convex and contains q, it also contains the rectangle with centre O and one vertex at q. But p is an interior point of this square which is impossible as p lies on f itself. This is a contradiction. 10S EXPLORING UNIVERSITY MATHEMATICS 1 Fig. ?.9. Hence Iq must pass through O and Iq is simply the line through O in the direction 0. T is symmetric about every line through O. But then T must be a circle, centre O. Otherwise there are two points h, j on T whose distances from O are not equal (see Fig. 7.10). Take the line I which is the internal bisector of angle jOh. r is symmetric about /. Thus t passes through points j, h', h, j' in some order, where h' and j' are the reflections of h, j in I. This contradicts the convexity Fig. 7.10. THE ISOPERIMETRIC PROBLEM 109 of r, but I shall not give a detailed proof of this. Thus again we have a solution of the isoperimetric problem. These solutions of the isoperimetric problem by Steiner are similar in the sense that they depend upon a modification of an extremal curve. The procedure is: (i) assume there exists a solution, (ii) modify the solution curve in some such way that, using the fact that it is a solution, we can identify it geometrically. Many geometrical problems can be tackled in this way. The power of the method lies in the vast range of possible modifications available, the limitation of the method lies in the fact that there are only a few curves that can be identi- fied geometrically. The method is also going to be difficult to apply if there is more than one solution, i. e. if in addition to a circle there had been some other curve that solved the isoperimetric problem, our modification would have had to be chosen so as to distinguish these two curves from all other convex curves. This is usually not possible. The modifications used by Steiner were obtained from two properties of circles: (i) an angle in a semicircle is \ n, and, (ii) a circle coincides with its reflection in any line through its centre. One of the advantages of Steiner's approach to the isoperimetric problem is that it gives us a method by which many other problems can be tackled. An entirely different approach is due to the Danish mathematician Tom Bonnesen. The idea behind this approach is to consider not one convex 110 EXPLORING UNIVERSITY MATHEMATICS 1 THE ISOPERIMETRIC PROBLEM 111 curve so much as a family of associated convex curves. The existence of this family enables us to construct a more elaborate mathematical structure, which makes more of mathematics available and leads to a simpler solution. Consider a convex curve C of length L (see Fig. 7.11). About each point p of C as centre construct a circle of Fig. 7.11. radius x. We get a family of circles whose envelope we denote by C x . As x varies, taking positive values, we get a family of curves and it is this family of curves that we operate with instead of C itself. To begin with, each curve C x is convex. I shall not prove this. Next if C is of length L and encloses an area A then C x is of length L + 2 tix and encloses an area A + Lx + nx 2 . These results are because C is convex. They may be proved by prov- ing them first in the particular case when C is a polygon, and then approximating to a general convex curve C by convex polygons. The proof when C is a polygon is easy because then C x is bounded by segments equal and parallel to the sides of C and by some circular arcs. For example in Fig. 7.12 the heavily shaded area can be put together to form a circle of radius x. Their total area is tix 2 . The lightly shaded areas have an area Lx and the dotted portion is the set enclosed by C, hence of area A. I shall not attempt to validate the approximation argu- ment. The area of C x is A + Lx + tix 2 . This expression is quadratic in x. We consider it for all values of x both positive and negative. This is a typical mathematical manoeuvre. The negative values of x do not correspond lo any very obvious geometrical entity, but we can use them to throw light on the behaviour of the quadratic function for positive values of x. Consider the equation Fig. 7.12. nx 2 + Lx + A = 0. When does this have real roots? Completing the square, the equation is L\* L 2 A x + 2ti 4ti 2 n TJ Thus we have real roots if and only if — > A. But this 4:t 112 EXPLORING UNIVERSITY MATHEMATICS 1 THE ISOPERIMETRIC PROBLEM 113 last inequality is the assertion that the area of C does not exceed the area of a circle of perimeter L, i. e. it is the assertion that the solution of the isopcrinietric problem is a circle. Thus we have only to show that for any convex curve C nx 2 + Lx + A = has real roots in order to solve the isoperimetric problem. When x is large nx 2 + Lx + A is large and positive. Thus the equation nx* + Lx + A = will have real roots if and only if we can find some value of x (which would be positive or negative) for which nx 2 + Lx + A is negative or zero. Now there exists in the area bounded by C a circle of largest possible radius. Let this largest possible radius be r. Then we shall show that nx 2 + Lx + A is negative or zero when x = — r. We call r the inradius of C. We again consider only the case when C is a polygon. The general case can be deduced by an approximation argument that we shall not consider. The largest circle modification ol C Fig. 7.13. . inside the convex polygon C either touches two opposite parallel sides, or it touches three sides which, when extended, form a triangle containing C. We deal with this last case. The other case is slightly easier and we shall not consider it. The process is illustrated in Fig. 7.13. Let I be this incircle of C and let O be its centre. The three points of contact of I with C divide C into three arcs de, ef, fd as shown. Suppose that the sides of C containing d, e, f respectively when produced form the triangle ghj, then we modify C as follows. Each side of C that lies in arc ef we move parallel to the line gO towards O until it touches I or lies on a line touching J. Similarly each side of C that lies in arc ed we move parallel to jO towards O, until it lies on a line touching / and finally each side of C in arc df we move parallel to the line hO towards O until it lies on a line touching /. The final figure is not convex but it contains 7 and thus has an area exceeding nr 2 . Suppose that the sides of C in order are of length L\, Lo, . . ., L/ k . and that L,- is distant p,- from O where i = 1, 2, . . ., k. Then if, as before, C encloses an area A and is of length L, k k A = ShLi p t , L = 2 L it i l where the first equation is obtained by dividing C into triangles each with O as a vertex and two adjacent vertices of C as the other two vertices. When we modified C the side of length Lf was moved a distance (p; — r) until finally it was distant r from O. In this motion it described a parallelogram whose thickness perpendicular to the side of length Li was (p, — r). Thus this parallelogram was of area Li (pi — r). The total loss of area in the modification was k 2 Li (pi — r) and the final area exceeded nr 2 . Hence A - 2 Li (p { - r) ^ nr 2 , 114 EXPLORING UNIVERSITY MATHEMATICS 1 i. e. i. e. A-2A- Ttr 2 - rl tL > nr 2 , - A<0. Thus «** -f Lx + A is negative or zero if x = — r. Hence it has real roots and thus finally the circle is a solution of ihe isoperimetric problem. 1 his method is fundamentally different from Steincr's. We here make no assumption that there actually is a solution. Our method of proof shows that there is a solution at the same time that it identifies the circle as being a solution. So far by this method we have not shown that the circle is the only possible solution. But we can also do this. Firstly there is a smallest circle 7i containing C (again we assume C is a convex polygon and omit the approximation argument needed in the general case). Either there are two diametrically opposite points of h that belong to C or there are three points of 7i that belong to C and form the vertices of a triangle which contains Ou the centre of l\ t in its interior. We consider this second case only; the first case is similar but rather easier. Suppose the three points of 7i on C are d, e, f. Select points j, g, h lying in the interior of the angles less than n, ^L dOi c,ZeOif,Z fOi d. Move each segment of C in arc de parallel to 0\ j away from 0\ until ii lies on a line touching fr. Perform similarly on the seg- ments of C in arcs ef, fd using g, h instead of j. Let R be the radius of If, R is called the circumradius of C. The final figure includes Ii and encloses an area not less than ttR 2 . Hence A + 2Li(R- Pi ) >nR\ 1. C. 7i R 2 - RL + A < 0. Thus n X s + Lx + A is non-positive when x = — R. Hence the distance apart of the roots of n x 2 + Lx + A = is not less than R — r. Hence THE ISOPERIMETRIC PROBI.KM (V A\\ 2 >R-r. 115 Squaring gives L 2 — 4ti A i> tz 2 (R — r) 2 . Thus L 2 = 4 7i A only if R = r, i. e. C is a circle. We can prove the isoperimetric inequality for the class of all convex polygons by an entirely different argument. We can proceed by an argument which although it modifies the length and the area concerned, does so in such a way that the quantity L 2 — 4-tiA (which is known as the isoperimetric deficit, L being the length and A the area of the convex set), is reduced. The procedure also reduces the number of sides of the convex polygon. If follows that if we can show that the isoperimetric deficit is positive for a triangle, then it will follow that it is positive for all convex polygons. This argument is an attractive one because it makes use of the polygonal nature of the sets concerned. It will not be possible to apply it to convex sets which are not polygons but its application to polygons is particularly- appropriate. Consider then a convex m-sided polygon, say P, with length of perimeter L and area A, we shall assume of course that m is greater than 3. (See Fig. 7.14.) We shall construct a fc-sided polygon, where k is less than m, say Pi, of length of perimeter L\, and area A\ such that L 2 - 4nA>L l 2 - ItiAl Let Si, . . ., S m be the sides of P in cyclic order, and let B\, . . ., B m be the interior bisectors of the angles of P, B, bisecting the angle between S{ and Si + l if Km and B m bisecting that between S m and S\. Then every point of Bj has the same distance from Sj as from Sj + i and the points of the intersection of consecutive bisectors are at the same distance from at least three sides. As these points of 116 EXPLORING UNIVERSITY MATHEMATICS 1 Fig. 7.14. intersection are finite in number there is one or more of them which are at a least distance from the sides of the polygon P. Suppose that this distance is denoted by X. If we move the sides of P inwards by an amount X we obtain a new polygon which we denote by Pi. It has fewer sides than P and its length L\ and area A\ are related to those of P by the formula A = A x + XU + hX (L - U), i. e. A - A x = J X(L + L x ). Thus we have the relation (L* - 4 7i A) - (Li» - 4 it At) = (L - U) (L+ U) - 2 nX (L + Lt) The parts of the sides of the polygon P which are in excess of the length of the parallel sides of P\ can be put together to form a polygon circumscribing a circle of radius X. Hence L — L\ is larger than the perimeter of this circle, i.e. 2 n/.. Thus finally we see that the isoperimetric deficit of P exceeds that of Pi. THE ISOPERIMETRIC PROBLEM 117 The idea underlying this argument can be used to obtain an even more precise result. We have seen that the quantity L 2 — 4/Ti A is always positive or zero for any convex polygon, and is zero for a circle. In fact, it is also always positive for any plane non-circular convex set. We may prove this by an approximation argument starting from convex polygons. Thus we can regard the quantity L 2 — 4-tiA as a measure of the degree to which the given convex set differs from a circle, the larger this quantity, the less like a circle is the given convex set, and the set is a circle if and only if this quantity is zero. It is natural to consider whether there are other functions which we can define and which have similar properties. If there are such functions, it may be possible to compare their value with the value given above. There are two fairly obvious functions of this nature. One is L — 2 nr, where r is the inradius of the convex set. This function is always positive or zero, and is positive unless the set is a circle. Similar remarks apply to the function 2nR — L, where R is the circumradius. Thus we may hope to extend our results by comparing the isoperimetric deficit with some functions depending upon either L — 2nr or 2tiR — L; in fact, the isoperimetric deficit is greater than or equal to (I — 2 7ir) 2 and we can prove this by the following extension of the previous argument. Let P be a polygon whose sides are of total length L and which is of area A (see Fig. 7.15). Denote by P<j the polygon whose sides are parallel to those of P and at a distance d from those of P measured inwards. For small values of d, Pa has the same number of sides as has P. But as d increases, the number of sides decreases. Denote the length of Pa by La, its area by Ad and its inradius by r,i- Also denote by fa the area bounded by the polygon whose sides are parallel to those of P<* and which all touch a given circle of unit radius. As d increases from zero to r^, there are a finite number of values, say d\, . . ., d* at which the number of sides of Pd decreases. Suppose that we had two values of d, say x and y, both lying between the same 118 EXPLORING UNIVERSITY MATHEMATICS t Area (, 8' Perimeter lengTti 2 f, 5 The polygons p tf all have the same incenlre. Fig. 7.15. two consecutive d { so that P x and P v have the same number of sides. Then it is easy to see that if x is greater than y and x — y = d, then we have the following relations: A x = A y + L y d + f„d 2 , i-'X ~ LJ]/ ~T" 2 f y d, r x = r„ + <3, U = fy By simple algebraic manipulations, we conclude that the expression L x r x — A x — f x r x z is identical with the same expression with .v replaced by .»/. In other words this quantity is a constant between any two consecutive values of the d t . Consider what happens when the variable d is allowed to increase through one of the values of di. Each of the quantities L dy r dt A d , is a continuous THE ISOPERIMETRIC PROBLEM 119 function of d. The function fa increases at d, because the number of sides of P d decreases. Hence the expression given above decreases at di, and thus it decreases as d increases from zero to r. As d tends to r, L d , r d and A d all decrease to zero and f d maintains a certain value, thus f d rd 2 also tends to zero. In other words, the quantity which we have above tends to zero, but it is decreasing and must therefore always be greater than zero. In particular. take d = 0; we deduce that Lor — Aq — for 2 > 0, and since fo ^ n, Lq = L, Aq = A, we have Lr — A — nr 2 > 0. Hence L z — 4-tzA ^ (L — 2 nr) 2 and this is the result which we had to prove. This result is also contained in Bonnesen's argument. The Greeks never really got to grips with the isoperimetric problem. Their difficulties were of a different type from those that they encountered in their other famous unsolved problems, the duplication of the cube, the trisection of an angle and the squaring of a circle. These last three problems were unsolved because they were, and are, insoluble. It is just impossible to construct a segment of length 2* f *L given a segment of length L and using only straight edge and compasses, and similarly for the other problems. But in the case of the isoperimetric problem the solution was there and the Greeks knew that it was, and what it was. But the problem involved such a wealth of complex concepts that the Greeks were never able to define these ideas and to give the problem a satisfactory mathematical formulation. Today the problem has been generalized out of all recognition until finally the name isoperimetric problems has been applied by the distinguished mathematicians Polya and Szego to problems that are very remote from the original one but which, being physical phenomena determined by 120 EXPLORING UNIVERSITY MATHEMATICS 1 geometric properties, can be tackled by methods similar to those of J. Steincr. Some of these can be found in the references given below. References for further reading KoGLESTON, Convexity, Cambridge University Press. ECGLESTON, Problems in Euclidean Space, Pergamon Piress. LYUSTERNIK, Convex Figures and Polyhedra. PrtLYA and SZEGO , Isoperimetric Inequalities in Mathematical Physics, Princeton University Press. YAGLOM and BOLTYANSKH, Convex Figures, Holt Reinhart & Wiston. The seven lectures included in this book formed the programme of the 1965 Conference in Mathematics held at Bedford College, London. They are published in this volume because they were given by pro- fessional mathematicians on subjects of current mathematical interest, and will be of value to a far wider audience than that able to attend the Conference. The lectures are primarily designed for students who are in their last or penultimate years at school and who will sub- sequently take a mathematics degree, orone in which mathematics forms a major subject. Publication of the lectures in this book will also considerably extend the effect produced by their presentation at the Conference; that is, the achieving of an impact which greatly helps to create interest by introducing students to branches of mathematics which will be novel to them. An interesting point is that two of the lectures givea simplified accountof research work recently presented by the lecturer. m x -o o •t 5' to c 3 <" CO 2 =r 3 ..JLI 510